Introduction to Statistics Chapter 1 PDF

Summary

This is a lecture or presentation on the introduction to statistics. It covers topics like data, populations, and the difference between statistics and parameters. It also touches on the collection of data and examples of populations of interest.

Full Transcript

Introduction to Statistics Chapter 1 Chapter 1: Introduction to Statistics Section 1-1: Introducing Data-Informed Thinking Section 1-2: Types of Data Section 1-3: Techniques for Collecting Data 2 Introducing Data-Informed...

Introduction to Statistics Chapter 1 Chapter 1: Introduction to Statistics Section 1-1: Introducing Data-Informed Thinking Section 1-2: Types of Data Section 1-3: Techniques for Collecting Data 2 Introducing Data-Informed Thinking Section 1-1 What’s in this Section? This lesson will introduce the wider purpose of an introductory course in statistics, as well as provide an overview of the two categories of statistics, motivate using data to make decisions, and define important terms we will use throughout the course. After studying this lesson, you should be able to: Identify the population of interest given a description of a sample. Distinguish between a population data set and a sample data set. Distinguish between parameters and statistics. 4 Why are you taking this class? Your degree or major requires this class. Have you thought about why? You may use the methods discussed in this course to collect, summarize, analyze, and interpret data. You may not do the collecting/analyzing directly, but instead read and understand summaries and reports that include data. Even if you are not involved in much data collection or decision- making, understanding how to treat data will help you as a consumer. Data informs marketing and advertising. Data is used in politics and journalism. 5 What is Data? Data: (noun) A collection of observations. Often numerical, but not always. Numerical data is often collected from measuring or counting things. Non-numerical data is often collected from surveys, interviews, or observations. Grammar note: “data” is the plural form of the noun “datum,” but the singular is rarely used. Example: In a list of heights of 24 randomly chosen college students, the list of all 24 measurements is the data, and the third measurement in the list, 5’10”, is a datum. 6 Data vs. Statistics “Data” and “Statistics” (and “Facts”) are words that are often used interchangeably in casual settings, but this course treats these words precisely. Statistics (two definitions): (noun) The field of study that uses mathematical principles to collect, describe, and analyze data. (noun) The plural of the word “statistic,” which is a single measure summarizing sample data. (More on this definition later in this lesson.) The word “statistics” is often used too loosely – “The statistics show us that dinosaurs had feathers.” While it may be true that observations by paleontologists lead them to believe that dinosaurs had feathers, their methods and conclusions likely had little to do with statistics. 7 Population of Interest (1 of 3) Population: (noun) The complete, whole group of all observations of interest. Key word: “all.” Including ones we don’t have access to or even know exist. We rarely have access to the full list of all data values of interest. Collecting data describing the entire population is rarely possible, but when we do, it is called a census. We may add limitations or specific descriptors to further narrow down the population, and even then, still be unable to gather data for the whole group. Examples (each of these is a different population): The heights of all college freshmen. The heights of all college freshmen in the U.S. The heights of all first-year students at Wake Tech. The heights of all first-year students at Wake Tech who are taking at least one class on the Southern Wake Campus. (Even this specific, it’s very difficult to collect population data.) 8 Population of Interest (2 of 3) Gathering data for the entire population is often impossible. Example: Consider the population of interest, “the circumferences of all trees in Umstead State Park.” Imagine the cost (in terms of money, time, and labor) involved if we were interested in measuring the circumference of every tree in Umstead Park. Suppose you took measuring tape, a clipboard, and a detailed map to Umstead Park and started measuring trees. After measuring for a year straight, 8 hours per day, 5 days per week, you would only have a fraction of the trees measured. Meanwhile, some of the trees you measured fell down (and are no longer in the population of interest), and new trees started growing where you already measured. 9 Population of Interest (3 of 3) A few clarifying points: The word “population” in this class does not mean “number of people living in a certain area.” We refer to the number of individuals in the population as the “population size” (and we often don’t know that, either). Population can mean the individuals we are studying (i.e., the people), but it usually means the measurements themselves (i.e., their heights). For the group of “All Wake Tech Students,” we can have several measurements for each person, and each could be a population of interest. Heights of all Wake Tech students, weights of all Wake Tech students, GPAs of all Wake Tech students, etc. can each be a population we are interested in describing. 10 Sample: A Subset of the Population Sample – Two definitions: (noun) A list or subset of observations from a larger group, the population of interest. Samples are generally collected randomly (to be discussed later in Section 1- 3) and assumed to be representative (meaning, authentically reflecting the realities in the population). (verb) The process of selecting individuals from a population to be in a sample. Section 1-3 will discuss sampling methods in more detail. 11 Representative Samples (1 of 2) Depending on how a sample is collected, it may not do a good job representing the stated population of interest. While it may not represent the stated population of interest, it could represent some other population. Example 1: If the you go to Umstead Park and measure the circumference of 100 random trees, but you only selected pine trees, then the population being described by that sample is “All pine trees in Umstead Park,” not “All trees in Umstead Park.” 12 Representative Samples (1 of 2) Example #2: Suppose you are interested in describing the heights of all Wake Tech students, but all of your data is collected on the Southern Wake Campus on a Tuesday morning. Then your sample does not represent the stated population of interest, and instead may represent the population of “All Wake Tech students who are taking at least one daytime Tuesday class on Southern Wake Campus.” You’re leaving out: Online students Students at other campuses Students with a MWF class schedule Night students 13 Two Main Categories of Statistics This course will organize how we use data into two main categories: Descriptive Statistics (Ch. 1-3, 9) Inferential Statistics (Ch. 7-9) Using measures to summarize Based on limited information, numerical data what can we know about the Where is the center? population? How spread out are the values? Estimation Using graphs to visualize the Without measuring all college distribution of data students’ heights, can we estimate What is the shape? what the average height is? How is one value related to the Hypothesis Testing rest? Does the data agree with or contradict a claim someone is making about the mean? 14 (So, what’s in Chapters 4-6?) Chapters 4-6 will discuss probability, random variables, and probability distributions. Probability is a tool that we use to properly do inference. Chapters 4-6 will serve as a mathematical bridge between the “collecting/describing data” step and the “drawing conclusions about the population” step. 15 Parameters and Statistics (1 of 2) We are often interested in a single value that summarizes all of the observations for a group. For example, when you set out to measure the circumference of all trees in Umstead Park, you may be interested in the average circumference once you have all of those measurements. Parameter: (noun) A single value summarizing the observations in a population data set. Statistic: (noun) A single value summarizing the observations in a sample data set. Remember the first letter – parameters come from populations, statistics come from samples. 16 Parameters and Statistics (2 of 2) Parameters and statistics are linked. If we are interested in the average circumference for all trees in Umstead Park (a parameter), then: We will collect a sample of 100 random trees in Umstead Park and find the average circumference of that group (a statistic). “All trees in Umstead Park” is the population of interest. “The 100 trees we measured” is the sample. “The average circumference for all trees in Umstead Park” is the parameter of interest. “The average circumference for the 100 trees we measured” is the statistic we’re using to estimate the parameter. 17 Using Data to Make Decisions Statistics is a science which uses results from mathematics and probability to describe patterns in data and patterns from data. Much like physics, statistics is a science that uses math, but is a separate field from math. After sample data is collected and summarized, we can make judgments about the larger population using inference. Decisions made using data are more objective; statistics ideally removes or limits human tendencies to favor our own preferences, preconceptions, and biases. 18 Tying it all Together (1 of 5) Read the scenario below, and then answer the questions. The HR manager for a local office of a large company designs a survey to 1.What is the give to the employees at that office location. The company employs over population of 15,000 workers, but only 2861 work in this office. The manager’s survey interest? asks the following questions: 2.What is the sample? a. What is your preference between remote work or in-person work? 3.What are the b. How many dependents do you have? statistics that are c. How many cups of coffee do you drink per week? calculated? d. What was your college GPA? 4.What are the parameters the She sends the survey by email to a randomly chosen group of 250 manager is employees at this office. After collecting the responses, she calculates attempting to the average number of dependents (2.1), cups of coffee per week (12.9), describe? and GPA (3.08) for the responses. 19 Tying it all Together (2 of 5) Read the scenario below, and then answer the questions. The HR manager for a local office of a large company designs a survey to 1.What is the give to the employees at that office location. The company employs over population of 15,000 workers, but only 2861 work in this office. The manager’s survey interest? asks the following questions: The population of a. What is your preference between remote work or in-person work? interest is “all 2861 b. How many dependents do you have? employees who work at this office.” c. How many cups of coffee do you drink per week? It cannot be “all d. What was your college GPA? employees of the company” because the She sends the survey by email to a randomly chosen group of 250 survey was only sent to employees at this office. After collecting the responses, she calculates workers at this office. the average number of dependents (2.1), cups of coffee per week (12.9), and GPA (3.08) for the responses. 20 Tying it all Together (3 of 5) Read the scenario below, and then answer the questions. The HR manager for a local office of a large company designs a survey to 2.What is the sample? give to the employees at that office location. The company employs over The sample is the 15,000 workers, but only 2861 work in this office. The manager’s survey responses from the 250 asks the following questions: employees who a. What is your preference between remote work or in-person work? completed the survey. b. How many dependents do you have? If the 250 employees c. How many cups of coffee do you drink per week? were randomly chosen, we can assume the d. What was your college GPA? sample is representative. She sends the survey by email to a randomly chosen group of 250 (Meaning, no patterns employees at this office. After collecting the responses, she calculates exist that leave out the average number of dependents (2.1), cups of coffee per week (12.9), groups in the and GPA (3.08) for the responses. population.) 21 Tying it all Together (4 of 5) Read the scenario below, and then answer the questions. 3.What are the The HR manager for a local office of a large company designs a survey to statistics that are give to the employees at that office location. The company employs over calculated? 15,000 workers, but only 2861 work in this office. The manager’s survey Average number of asks the following questions: dependents for the 250 a. What is your preference between remote work or in-person work? responses, 2.1. b. How many dependents do you have? Average number of cups of coffee per week c. How many cups of coffee do you drink per week? for the 250 responses, 12.9. d. What was your college GPA? Average GPA for the She sends the survey by email to a randomly chosen group of 250 250 responses, 3.08. employees at this office. After collecting the responses, she calculates the average number of dependents (2.1), cups of coffee per week (12.9), and GPA (3.08) for the responses. 22 Tying it all Together (5 of 5) Read the scenario below, and then answer the questions. 4.What are the The HR manager for a local office of a large company designs a survey to parameters the give to the employees at that office location. The company employs over manager is 15,000 workers, but only 2861 work in this office. The manager’s survey attempting to asks the following questions: describe? a. What is your preference between remote work or in-person work? Average number of dependents for all 2861 b. How many dependents do you have? workers at this office. c. How many cups of coffee do you drink per week? Average number of cups of coffee per week d. What was your college GPA? for all 2861 workers at She sends the survey by email to a randomly chosen group of 250 this office. employees at this office. After collecting the responses, she calculates Average GPA for all the average number of dependents (2.1), cups of coffee per week (12.9), 2861 workers at this and GPA (3.08) for the responses. office. (All unknown.) 23 What’s next? The next section will discuss in further detail the concept of a “variable” that is measured or observed in sample data, highlighting two main types of variables we can study. 24

Use Quizgecko on...
Browser
Browser