Introduction to Statistics Lecture PDF
Document Details
Uploaded by CheapestPelican
Dawson College
Tags
Summary
This document provides a basic introduction to statistics, defining key terms and concepts like data, data sets, population, and sample. It also introduces different types of data and measurement levels as well as common sampling techniques.
Full Transcript
What is Data? Data Consist of information coming from observations, counts, measurements, or responses. Introduction to Statistics “People who eat three...
What is Data? Data Consist of information coming from observations, counts, measurements, or responses. Introduction to Statistics “People who eat three daily servings of whole grains have been shown to reduce their risk of…stroke by 37%.” (Source: Whole Grains Council) “Seventy percent of the 1500 U.S. spinal cord injuries to minors result from vehicle accidents, and 68 percent were not wearing a seatbelt.” (Source: UPI) 1 2 What is Statistics? Data Sets Statistics Population The science of collecting, The collection of all outcomes, organizing, analyzing, and responses, measurements, or interpreting data in order to counts that are of interest. make decisions. Sample A subset of the population. 3 4 Example: Identifying Data Sets Solution: Identifying Data Sets The population consists of the In a recent survey, 1708 adults in the United States were responses of all adults in the asked if they think global warming is a problem that U.S. requires immediate government action. Nine hundred Responses of adults in thirty-nine of the adults said yes. Identify the population The sample consists of the the U.S. (population) and the sample. Describe the data set. (Adapted from: Pew responses of the 1708 adults in Responses of the U.S. in the survey. adults in survey Research Center) (sample) The sample is a subset of the responses of all adults in the U.S. The data set consists of 939 yes’s and 769 no’s. 5 6 Parameter and Statistic Example: Distinguish Parameter and Statistic Decide whether the numerical value describes a Parameter population parameter or a sample statistic. A number that describes a population 1. A recent survey of a sample of MBAs characteristic. reported that the average salary for an Average age of all people in the MBA is more than $82,000. (Source: United States The Wall Street Journal) Statistic A number that describes a sample Solution: characteristic. Sample statistic (the average of $82,000 is based Average age of people from a sample on a subset of the population) of three states 7 8 Example: Distinguish Parameter and Statistic Branches of Statistics Decide whether the numerical value describes a population parameter or a sample statistic. Descriptive Statistics Inferential Statistics Involves organizing, Involves using sample 2. Starting salaries for the 667 MBA summarizing, and data to draw displaying data. conclusions about a graduates from the University of population. Chicago Graduate School of Business e.g. Tables, charts, increased 8.5% from the previous year. averages Solution: Population parameter (the percent increase of 8.5% is based on all 667 graduates’ starting salaries) 9 10 Example: Descriptive and Inferential Solution: Descriptive and Inferential Statistics Statistics Decide which part of the study represents the Descriptive statistics involves statements such as “For descriptive branch of statistics. What conclusions might unmarried men, approximately 70% were alive at age be drawn from the study using inferential statistics? 65” and “For married men, 90% were alive at 65.” A large sample of men, aged 48, was studied for 18 years. For A possible inference drawn from the study is that being unmarried men, approximately married is associated with a longer life for men. 70% were alive at age 65. For married men, 90% were alive at age 65. (Source: The Journal of Family Issues) 11 12 Types of Data Types of Data Qualitative Data Quantitative data Consists of attributes, labels, or nonnumerical entries. Numerical measurements or counts. Major Place of birth Age Weight of a letter Temperature Eye color 13 14 Example: Classifying Data by Type Solution: Classifying Data by Type The base prices of several vehicles are shown in the table. Which data are qualitative data and which are quantitative data? (Source Ford Motor Company) Qualitative Data Quantitative Data (Names of vehicle (Base prices of models are vehicles models are nonnumerical entries) numerical entries) 15 16 Levels of Measurement Example: Classifying Data by Level Nominal level of measurement Two data sets are shown. Which data set consists of data Qualitative data only at the nominal level? Which data set consists of data at Categorized using names, labels, or qualities the ordinal level? (Source: Nielsen Media Research) No mathematical computations can be made Ordinal level of measurement Qualitative or quantitative data Data can be arranged in order Differences between data entries is not meaningful 17 18 Solution: Classifying Data by Level Levels of Measurement Interval level of measurement Quantitative data Data can ordered Differences between data entries is meaningful Zero represents a position on a scale (not an inherent Ordinal level (lists the Nominal level (lists the zero – zero does not imply “none”) rank of five TV programs. call letters of each network Data can be ordered. affiliate. Call letters are Difference between ranks names of network is not meaningful.) affiliates.) 19 20 Levels of Measurement Example: Classifying Data by Level Ratio level of measurement Two data sets are shown. Which data set consists of data Similar to interval level at the interval level? Which data set consists of data at Zero entry is an inherent zero (implies “none”) the ratio level? (Source: Major League Baseball) A ratio of two data values can be formed One data value can be expressed as a multiple of another 21 22 Solution: Classifying Data by Level Summary of Four Levels of Measurement Put data Arrange Subtract Determine if one Level of in data in data data value is a Measurement categories order values multiple of another Nominal Yes No No No Ordinal Yes Yes No No Interval level (Quantitative Interval Yes Yes Yes No data. Can find a difference Ratio Yes Yes Yes Yes between two dates, but a ratio does not make sense.) Ratio level (Can find differences and write ratios.) 23 24 Designing a Statistical Study Data Collection 1. Identify the variable(s) 3. Collect the data. Observational study of interest (the focus) 4. Describe the data using A researcher observes and measures characteristics of and the population of descriptive statistics interest of part of a population. the study. techniques. 2. Develop a detailed plan 5. Interpret the data and for collecting data. If Researchers observed and recorded the mouthing make decisions about you use a sample, make behavior on nonfood objects of children up to three the population using sure the sample is years old. (Source: Pediatric Magazine) inferential statistics. representative of the 6. Identify any possible population. errors. 25 26 Data Collection Data Collection Experiment Simulation A treatment is applied to part of a population and Uses a mathematical or physical model to reproduce responses are observed. the conditions of a situation or process. Often involves the use of computers. An experiment was performed in which diabetics took cinnamon extract daily while a control group Automobile manufacturers use simulations with took none. After 40 days, the diabetics who had the dummies to study the effects of crashes on humans. cinnamon reduced their risk of heart disease while the control group experienced no change. (Source: Diabetes Care) 27 28 Data Collection Example: Methods of Data Collection Survey Consider the following statistical studies. Which An investigation of one or more characteristics of a method of data collection would you use to collect data population. for each study? Commonly done by interview, mail, or telephone. 1. A study of the effect of changing flight patterns on the number of airplane accidents. A survey is conducted on a sample of female physicians to determine whether the primary reason Solution: for their career choice is financial stability. Simulation (It is impractical to create this situation) 29 30 Example: Methods of Data Collection Example: Methods of Data Collection 2. A study of the effect of eating oatmeal on lowering 3. A study of how fourth grade students solve a puzzle. blood pressure. Solution: Solution: Experiment (Measure the effect Observational study (observe of a treatment – eating oatmeal) and measure certain characteristics of part of a population) 31 32 Example: Methods of Data Collection Key Elements of Experimental Design 4. A study of U.S. residents’ approval rating of the U.S. Control president. Randomization Replication Solution: Survey (Ask “Do you approve of the way the president is handling his job?”) 33 34 Key Elements of Experimental Design: Key Elements of Experimental Design: Control Control Control for effects other than the one being measured. Placebo effect Confounding variables A subject reacts favorably to a placebo when in Occurs when an experimenter cannot tell the fact he or she has been given no medical treatment difference between the effects of different factors on a at all. variable. Blinding is a technique where the subject does not A coffee shop owner remodels her shop at the same know whether he or she is receiving a treatment or time a nearby mall has its grand opening. If business a placebo. at the coffee shop increases, it cannot be determined Double-blind experiment neither the subject nor whether it is because of the remodeling or the new the experimenter knows if the subject is receiving mall. a treatment or a placebo. 35 36 Key Elements of Experimental Design: Example: Experimental Design Replication A company wants to test the effectiveness of a new gum Replication is the repetition of an experiment using a developed to help people quit smoking. Identify a large group of subjects. potential problem with the given experimental design and suggest a way to improve it. To test a vaccine against a strain of influenza, 10,000 people are given the vaccine and another 10,000 The company identifies one thousand adults who are people are given a placebo. Because of the sample heavy smokers. The subjects are divided into blocks size, the effectiveness of the vaccine would most according to gender. After two months, the female likely be observed. group has a significant number of subjects who have quit smoking. 37 38 Solution: Experimental Design Sampling Techniques Problem: Simple Random Sample The groups are not similar. The new gum may have a Every possible sample of the same size has the same greater effect on women than men, or vice versa. chance of being selected. Correction: The subjects can be divided into blocks according to x x x x x xxxx x xxxx x x x x xx x xx x x x x xx xx x x x x x x xxx xx x x x gender, but then within each block, they must be x xxxx x x xx x x xx x x x xxxxx x x xx x x x x x xx xx randomly assigned to be in the treatment group or the x x x xx x x x x x x x xx x xx x x x xx xx xx control group. 39 40 Simple Random Sample Example: Simple Random Sample Random numbers can be generated by a random There are 731 students currently enrolled in statistics at number table, a software program or a calculator. your school. You wish to form a sample of eight students to answer some survey questions. Select the Assign a number to each member of the population. students who will belong to the simple random sample. Members of the population that correspond to these numbers become members of the sample. Assign numbers 1 to 731 to each student taking statistics. On the table of random numbers, choose a starting place at random (suppose you start in the third row, second column.) 41 42 Solution: Simple Random Sample Other Sampling Techniques Stratified Sample Divide a population into groups (strata) and select a random sample from each group. To collect a stratified sample of the number of people who live in West Ridge County households, you could Read the digits in groups of three divide the households into socioeconomic levels and Ignore numbers greater than 731 then randomly select households from each level. The students assigned numbers 719, 662, 650, 4, 53, 589, 403, and 129 would make up the sample. 43 44 Other Sampling Techniques Other Sampling Techniques Cluster Sample Systematic Sample Divide the population into groups (clusters) and Choose a starting value at random. Then choose select all of the members in one or more, but not every kth member of the population. all, of the clusters. In the West Ridge County example you could divide In the West Ridge County example you could assign the households into clusters according to zip codes, a different number to each household, randomly then select all the households in one or more, but choose a starting number, then select every 100th not all, zip codes. household. 45 46 Example: Identifying Sampling Techniques Example: Identifying Sampling Techniques You are doing a study to determine the opinion of 2. You assign each student a number and generate students at your school regarding stem cell research. random numbers. You then question each student Identify the sampling technique used. whose number is randomly selected. 1. You divide the student population with respect to majors and randomly select and question Solution: some students in each major. Simple random sample (each sample of the same size has an equal chance of being selected and Solution: each student has an equal chance of being Stratified sampling (the students are divided into selected.) strata (majors) and a sample is selected from each major) 47 48