Chapter 1 Introduction to Statistics PDF
Document Details
Uploaded by Deleted User
Ms. Sara Asad
Tags
Summary
This document provides an introduction to statistics, covering fundamental concepts such as population, sample, and variables. It details different types of scales, data, and notation. The document also explores the role of descriptive and inferential statistics, and sampling error in research.
Full Transcript
Chapter 1: Introduction to Statistics Course Instructor: Ms. Sara Asad 1 Statistics The term statistics refers to a set of mathematical procedures for organizing, summarizing, and interpreting information. Statistics serve two gen...
Chapter 1: Introduction to Statistics Course Instructor: Ms. Sara Asad 1 Statistics The term statistics refers to a set of mathematical procedures for organizing, summarizing, and interpreting information. Statistics serve two general purposes: 1. Statistics are used to organize and summarize the information so that the researcher can see what happened to the research study and can communicate the results to others. 2. Statistics help the researcher to answer the questions that initiated the research by determining exactly what general conclusions are justified based on the specific results that were obtained. 2 Population The entire group of individuals is called the population. 3 A population can be quite large—for example, the entire set of women on the planet Earth. A researcher might be more specific, limiting the population for study to women who are registered voters in the United States. Perhaps the investigator would like to study the population consisting of women who are heads of state. Populations can obviously vary in size from extremely large to very small, depending on how the investigator defines the population. The population being studied should always be identified by the researcher. In addition, the population need not consist of people—it could be a population of rats, corporations, parts produced in a factory, or anything else an investigator wants to study. In practice, populations are typically very large, such as the population of college sophomores in the United States or the population of small businesses. Because populations tend to be very large, it usually is impossible for a researcher to examine every individual in the population of interest. Therefore, researchers typically select a smaller, more manageable group from the population and limit their studies to the individuals in the selected group. 4 Sample Usually populations are so large that a researcher cannot examine the entire group. Therefore, a sample is selected to represent the population in a research study. The goal is to use the results obtained from the sample to help answer questions about the population. Definition: A sample is a set of individuals selected from a population, usually intended to represent the population in a research study. 5 Variables A variable is a characteristic or condition that can change or has different values for different individuals. For example, a researcher may be interested in the influence of the weather on people’s moods. As the weather changes, do people’s moods also change? Something that can change or have different values is called a variable. 7 Data To demonstrate changes in variables, it is necessary to make measurements of the variables being examined. Data (plural) are measurements or observations. A data set is a collection of measurements or observations. A datum (singular) is a single measurement or observation and is commonly called a score or raw score. 8 Data The measurements obtained in a research study are called the data. The goal of statistics is to help researchers organize and interpret the data. 9 Parameters & Statistics A parameter is a value, usually a numerical value, that describes a population. A parameter is usually derived from measurements of the individuals in the population. A statistic is a value, usually a numerical value, that describes a sample. A statistic is usually derived from measurements of the individuals in the sample. 10 Descriptive and Inferential Statistical Methods Descriptive statistics are statistical procedures used to summarize, organize, and simplify data. Inferential statistics consist of techniques that allow us to study samples and then make generalizations about the population from which they were selected. 11 Sampling Error The discrepancy between a sample statistic and its population parameter is called sampling error. The unpredictable, unsystematic differences that exist from one sample to another are an example of sampling error. Defining and measuring sampling error is a large part of inferential statistics. 12 Imagine that your statistics class is separated into two groups by drawing a line from front to back through the middle of the room. Now imagine that you compute the average age (or height, or IQ) for each group. Will the two groups have exactly the same average? Almost certainly they will not. No matter what you chose to measure, you will probably find some difference between the two groups. However, the difference you obtain does not necessarily mean that there is a systematic difference between the two groups. For example, if the average age for students on the right-hand side of the room is higher than the average for students on the left, it is unlikely that some mysterious force has caused the older people to gravitate to the right side of the room. Instead, the difference is probably the result of random factors such as chance. The unpredictable, unsystematic differences that exist from one sample to another are an example of sampling error. 14 15 Learning Check A researcher is interested in the sleeping habits of American college students. A group of 50 students is interviewed and the researcher finds that these students sleep an average of 6.7 hours per day. For this study, the average of 6.7 hours is an example of a(n) ____________ a. Parameter b. Statistic c. Population d. Sample 16 Learning Check A researcher is curious about the average IQ of registered voters in the state of Florida. The entire group of registered voters in the state is an example of a ___________. a. Sample b. Statistic c. Population d. Parameter 17 Learning Check Statistical techniques that summarize, organize, and simplify data are classified as _____________. a. Population statistics b. Sample statistics c. Descriptive statistics d. Inferential statistics 18 Learning Check In general, __________ statistical techniques are used to summarize the data from a research study and __________ statistical techniques are used to determine what conclusions are justified by the results. a. inferential, descriptive b. Descriptive, inferential c. Sample, population d. Population, sample 19 Constructs and Operational Definitions Constructs are internal attributes or characteristics that cannot be directly observed but are useful for describing and explaining behavior. An operational definition identifies a measurement procedure (a set of operations) for measuring an external behavior and used the resulting measurements as a definition and a measurement of a hypothetical construct. Operational definition has two components. First, it describes a set of operations for measuring a construct. Second, it defines the construct in terms of the resulting measurements. 20 Psychological Distress Conceptual Definition. Psychological distress is characterized by symptoms of anxiety (e.g., feeling tense, restlessness, and sleep disturbances) and depression (e.g., sadness, loss of interest, and helplessness). Psychological distress is a state that ascribes person’s vulnerability towards general psychopathology with a blend of anxiety and depressive symptomatology. Operational Definition. Participants who score ≥ 19 on Kessler Psychological Distress scale. 21 Types of Variables Variables can be classified as discrete or continuous. Discrete variables (such as class size) consist of separate, indivisible categories. No values can exist between two neighboring categories. Continuous variables (such as time or weight) are infinitely divisible into whatever units a researcher may choose. For example, time can be measured to the nearest minute, second, half-second, etc. 22 Real Limits To define the units for a continuous variable, a researcher must use real limits which are boundaries located exactly half-way between adjacent categories. Real limits are the boundaries of intervals for scores that are represented on a continuous number line. The real limit separating two adjacent scores is located exactly halfway between the scores. Each score has two real limits. The upper real limit is at the top of the interval, and the lower real limit is at the bottom. 23 Two other factors apply to continuous variables: 1. When measuring a continuous variable, it should be very rare to obtain identical measurements for two different individuals. Because a continuous variable has an infinite number of possible values, it should be almost impossible for two people to have exactly the same score. If the data show a substantial number of tied scores, then you should suspect that the measurement procedure is very crude or that the variable is not really continuous. 24 2. When measuring a continuous variable, each measurement category is actually an interval that must be defined by boundaries. For example, two people who both claim to weigh 150 pounds are probably not exactly the same weight. However, they are both around 150 pounds. One person may actually weigh 149.6 and the other 150.3. Thus, a score of 150 is not a specific point on the scale but instead is an interval. To differentiate a score of 150 from a score of 149 or 151, we must set up boundaries on the scale of measurement. These boundaries are called real limits and are positioned exactly halfway between adjacent scores. Thus, a score of X =150 pounds is actually an interval bounded by a lower real limit of 149.5 at the bottom and an upper real limit of 150.5 at the top. Any individual whose weight falls between these real limits will be assigned a score of X =150. 25 26 Scales of Measurement Data collection requires that we make measurements of our observations. Measurement involves assigning individuals or events to categories. The categories can simply be names such as male/female or employed/unemployed, or they can be numerical values such as 68 inches or 175 pounds. The categories used to measure a variable make up a scale of measurement, and the relationships between the categories determine different types of scales. The distinctions among the scales are important because they identify the limitations of certain types of measurements and because certain statistical procedures are appropriate for scores that have been measured on some scales but not on others. 27 Nominal Scale The word nominal means “having to do with names.” Measurement on a nominal scale involves classifying individuals into categories that have different names but are not related to each other in any systematic way. For example, if you were measuring the academic majors for a group of college students, the categories would be art, biology, business, chemistry, and so on. Each student would be classified in one category according to his or her major. The measurements from a nominal scale allow us to determine whether two individuals are different, but they do not identify either the direction or the size of the difference. If one student is an art major and another is a biology major we can say that they are different, but we cannot say that art is “more than” or “less than” biology and we cannot specify how much difference there is between art and biology. Other examples of nominal scales include classifying people by race, 28 gender, or occupation. Although the categories on a nominal scale are not quantitative values, they are occasionally represented by numbers. For example, the rooms or offices in a building may be identified by numbers. You should realize that the room numbers are simply names and do not reflect any quantitative information. Room 109 is not necessarily bigger than Room 100 and certainly not 9 points bigger. It also is fairly common to use numerical values as a code for nominal categories when data are entered into computer programs. For example, the data from a survey may code males with a 0 and females with a 1. Again, the numerical values are simply names and do not represent any quantitative difference. 29 Ordinal Scale An ordinal scale consists of a set of categories that are organized in an ordered sequence. Measurements on an ordinal scale rank observations in terms of size or magnitude. 30 Often, an ordinal scale consists of a series of ranks (first, second, third, and so on) like the order of finish in a horse race. Occasionally, the categories are identified by verbal labels like small, medium, and large drink sizes at a fast- food restaurant. In either case, the fact that the categories form an ordered sequence means that there is a directional relationship between categories. With measurements from an ordinal scale, you can determine whether two individuals are different and you can determine the direction of difference. However, ordinal measurements do not allow you to determine the size of the difference between two individuals. In a NASCAR race, for example, the first-place car finished faster than the second-place car, but the ranks don’t tell you how much faster. Other examples of ordinal scales include socioeconomic class (upper, middle, lower) and T-shirt sizes (small, medium, large). 31 Interval & Ratio Scale 32 The factor that differentiates an interval scale from a ratio scale is the nature of the zero point. An interval scale has an arbitrary zero point. That is, the value 0 is assigned to a particular location on the scale simply as a matter of convenience or reference. In particular, a value of zero does not indicate a total absence of the variable being measured. For example a temperature of 0º Fahrenheit does not mean that there is no temperature, and it does not prohibit the temperature from going even lower. Interval scales with an arbitrary zero point are relatively rare. The two most common examples are the Fahrenheit and Celsius temperature scales. 33 A ratio scale is anchored by a zero point that is not arbitrary but rather is a meaningful value representing none (a complete absence) of the variable being measured. The existence of an absolute, non-arbitrary zero point means that we can measure the absolute amount of the variable; that is, we can measure the distance from 0. This makes it possible to compare measurements in terms of ratios. With a ratio scale, we can measure the direction and the size of the difference between two measurements and we can describe the difference in terms of a ratio. Ratio scales are quite common and include physical measures such as height and weight, as well as variables such as reaction time or the number of errors on a test. 34 Learning Check An operational definition is used to _______ a. Define b. Measure c. Measure and define d. None of the other choices is correct 35 Learning Check A researcher studies the factors that determine the number of children that couples decide to have. The variable, number of children, is an example of a ___________ variable. a. Discrete b. Continuous c. Nominal d. Ordinal 36 Learning Check The teacher in a communication class asks students to identify their favorite television show. The different television shows make up a _________ scale of measurement. a. Nominal b. Ordinal c. Interval d. Ratio 37 Notation The individual measurements or scores obtained for a research participant will be identified by the letter X (or X and Y if there are multiple scores for each individual). The number of scores in a data set will be identified by N for a population or n for a sample. Summing a set of values is a common operation in statistics and has its own notation. The Greek letter sigma, Σ, will be used to stand for "the sum of." For example, ΣX identifies the sum of the scores. 38 Order of Operations 1. All calculations within parentheses are done first. 2. Squaring or raising to other exponents is done second. 3. Multiplying, and dividing are done third, and should be completed in order from left to right. 4. Summation with the Σ notation is done next. 5. Any additional adding and subtracting is done last and should be completed in order from left to right. 39 40 41 For the following set of scores, find the value of each expression X ΣX2 3 (ΣX)2 2 Σ(X – 1) 5 Σ(X – 1)2 1 3 42 43 44 45