Introduction to Statistics Pt1 PDF
Document Details
Uploaded by Deleted User
Mohammad M. Yamin and Farhad Reza
Tags
Summary
This document provides an introduction to the fundamental concepts of statistics. It covers definitions like 'population' and 'sample.' It also explains types of variables and data collection methods.
Full Transcript
Introduction to Statistics Mohammad M. Yamin, Ph.D., P.E. and Farhad Reza, Ph.D., P.E. 1 The science of statistics is Collecting Organizing Summarizing Analyzing information to draw conclusions or...
Introduction to Statistics Mohammad M. Yamin, Ph.D., P.E. and Farhad Reza, Ph.D., P.E. 1 The science of statistics is Collecting Organizing Summarizing Analyzing information to draw conclusions or answer questions 2 1 Definitions A population Is the group to be studied Includes all of the individuals in the group A sample Is a subset of the population Is often used in analyses because getting access to the entire population is impractical A Individual Is a person or object and, is a member of the population being studied. 3 Descriptive statistics consist of organizing and summarizing data. Descriptive statistics describe data through numerical summaries, tables, and graphs. A statistic is a numerical summary based on a sample. A parameter is a numerical summary of a population. 4 2 Suppose the percentage of all students on your campus who have a job is 84.9%. This value represents a parameter because it is a numerical summary of a population. Suppose a sample of 250 students is obtained, and from this sample we find that 86.3% have a job. This value represents a statistic because it is a numerical summary based on a sample. 5 Variables Characteristics of the individuals under study are called variables Some variables have values that are attributes or characteristics … those are called qualitative or categorical variables Some variables have values that are numeric measurements … those are called quantitative variables. Note !: Arithmetic operations such as addition and subtraction can be performed on the values of the quantitative variable and provide meaningful results. 6 3 Examples of qualitative variables Gender Zip code Blood type States in the United States Brands of televisions Qualitative variables have category values … those values cannot be added, subtracted, etc. 7 Examples of quantitative variables Temperature Height and weight Sales of a product Number of children in a family Points achieved playing a video game Quantitative variables have numeric values … those values can be added, subtracted, etc. 8 4 Quantitative Variables Quantitative variables can be either discrete or continuous Discrete variables Variables that have a finite or a countable number of possibilities. Countable means the values result from counting such as 0, 1, 2, 3, and so on. Frequently variables that are counts Continuous variables has an infinite number of possible values they can take on can be measured to any desired level of accuracy. 9 Examples of discrete variables The number of heads obtained in 5 coin flips The number of cars arriving at a McDonald’s between 12:00 and 1:00 The number of students in class The number of points scored in a football game The possible values of qualitative variables can be listed Examples of continuous variables The distance that a particular model car can drive on a full tank of gas Heights of college students Sometimes the variable is discrete but has so many close values that it could be considered continuous The number of DVDs rented per year at video stores The number of ants in an ant colony 10 5 Ways to Collect Data There are different ways to collect data Census Existing sources Survey sampling Designed experiments These are good methods of data collection, if done correctly 11 Ways to Collect Data A census is a list Of all the individuals in a population That records the characteristics of the individuals An example is the US Census held every 10 years (this is only an example though) Advantages Answers have 100% certainty Disadvantages May be difficult or impossible to obtain Costs may be prohibitive 12 6 Ways to Collect Data An existing source is An appropriate data set has already been collected that can be used for this study Advantages Saves time and money Disadvantages There may not be an applicable data set 13 Ways to Collect Data A survey sample is A study when only a subset of the population is considered A study where there is no attempt to influence the value of the variable of interest Advantages Saves time and money Disadvantages Choosing an appropriate sample could be difficult 14 7 An Observational Study An observational study measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. (i.e. the researcher observes the behavior of the individuals in the study without trying to influence the outcome of the study. A survey sample is an example of an observational study An observational study is also called (after the fact) study Advantages It can detect associations between variables Disadvantages It cannot isolate causes to determine causation 15 A Designed Experiment If a researcher assigns the individuals in a study to a certain group intentionally changes the value of the explanatory variable, and then records the value of the response variable for each group, the researcher is conducting a designed experiment. 16 8 Designed Experiment A designed experiment, then means applying a treatment to individuals (referred to as experimental units or subjects) and attempting to isolate the effects of the treatment on a response variable. Advantages Can analyze individual factors Disadvantages Cannot be done when the variables cannot be controlled Cannot apply in cases for moral / ethical reasons 17 Differences Between Two Observational studies and designed experiments have some fundamental differences Observational studies do not control the variable under analysis while designed experiments do Because variables are uncontrolled in an observational study, the results can only be associations Because variables are controlled in a designed experiment, the results can be conclusions of causation 18 9 Example 1 19 Example 2 20 10 Notes on Examples 1 & 2 In both studies, the goal of the research was to determine if radio frequencies from cell phones increase the risk of contracting brain tumors. Whether or not brain cancer was contracted is the response variable. The level of cell phone usage is the explanatory variable. In research, we wish to determine how varying the amount of an explanatory variable affects the value of a response variable. In example 1 no attempt was made to influence individuals in the study. Researcher simply observed the behavior of individuals. So it is an observational study Example 2 each group was intentionally exposed to various levels of radiation. Since researcher controlled the explanatory variable it is a designed experiment. 21 SAMPLING Usually only a part of the population can be analyzed How do you choose your sample? The process is called sampling Types of sampling: – Simple random sampling – Stratified sampling – Systematic sampling – Cluster sampling – Convenience sampling 22 11 Simple Random Sampling Random sampling is the process of using chance to select individuals from a population to be included in the sample. A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. The sample is then called a simple random sample. 23 Stratified Sampling A stratified sample is obtained when we choose a simple random sample from subgroups of a population This is appropriate when the population is made up of nonoverlapping (distinct) groups called strata Within each strata, the individuals are likely to have a common attribute Between the strata, the individuals are likely to have different attributes 24 12 Stratified Sampling 25 Stratified Sampling Example – polling a population about a political issue It is reasonable to divide up the population into Democrats, Republicans, and Independents It is reasonable to believe that the opinions of individuals within each party are the same It is reasonable to believe that the opinions differ from group to group Therefore it makes sense to consider each strata separately 26 13 Systematic Sampling A systematic sample is obtained when we choose every kth individual in a population The first individual selected corresponds to a random number between 1 and k Systematic sampling is appropriate When we do not have a list of all the individuals in a population 27 Systematic Sampling 28 14 Cluster Sampling A cluster sample is obtained when we choose a random set of groups and then select all individuals within those groups 29 15