Chapter 1 Definitions PDF
Document Details
Uploaded by TroubleFreeGyrolite7426
University of Wisconsin-Milwaukee
Tags
Summary
This document provides definitions and explanations of basic statistical concepts, including population, sample, descriptive statistics, inferential statistics, and different types of sampling designs like simple random sampling, stratified random sampling, and cluster sampling. It also covers experimental designs. The document is suitable for an introductory-level statistics course.
Full Transcript
CHAPTER 1—BASIC STATISTICS DEFINITIONS POPULATION – An entire group to be studied. Note: N = Population size (number of elements in the population) SAMPLE – A selection or subset of members from the population, for analysis and projection back onto the population. Note: n = Sample size (...
CHAPTER 1—BASIC STATISTICS DEFINITIONS POPULATION – An entire group to be studied. Note: N = Population size (number of elements in the population) SAMPLE – A selection or subset of members from the population, for analysis and projection back onto the population. Note: n = Sample size (number of elements in the sample), where n ≤ N. DESCRIPTIVE STATISTICS – A mathematical means of organizing and summarizing information. INFERENTIAL STATISTICS – The methods for drawing, and measuring the reliability of, conclusions about a population based on a sample drawn from it. Population (Entire Sampling Design Group to be Sample (Raw Data) Studied) (Method of Data Collection Descriptive Statistics Inferential Statistics Information (Data Summarization) (Conclusions Drawn about Sample about Population from Sample) ELEMENT – A single member of a population or sample (also called an Experimental Unit). SAMPLING – The selection of elements from a population for study as a smaller group (the Sample), where the selection process often follows a particular methodology (the Sampling Design). VARIABLE – A characteristic of the members of a population, for which data can be collected. QUALITATIVE VARIABLE – Measures a non-numerical characteristic. QUANTITATIVE VARIABLE – Measures a characteristic that can be ranked or ordered on a numerical scale. DISCRETE QUANTITATIVE VARIABLE – Can only take on certain values, usually integers. CONTINUOUS QUANTITATIVE VARIABLE – Can take on any value within an expected range. Chapter 1 Definitions P. 2 Univariate Data – Only 1 variable sampled on the elements (Only 1 Characteristic Measured). Bivariate Data – Two variables sampled on the elements. Multivariate Data – Three or more variables sampled. Cross-Sectional Data vs. Time-Series Data 1) Cross-Sectional Data – data sampled at a particular point in time across members of a population or sufficiently-sized sample. 2) Time-Series Data – data sampled at multiple points in time for one or a few elements. Census – collecting data from ALL members of a population. Sampling Survey – collecting data from a sample or subset of a population. SAMPLING DESIGN – A methodology for choosing elements from a population to form a sample. There are several different methodologies available, ranked below from most random to least random. Note: The more random the sampling design, the more the results from studying the sample can be used to project back onto the population (make inferences). Here are some random sampling designs. 1) Simple Random Sampling – Every element of the population has an equal likelihood of being chosen. Often uses random number generators or tables (see Appendix of Textbook). 2) Stratified Random Sampling – Selecting elements randomly from stratified subgroups within the population. Often done by selecting a number of elements from each subgroup that is proportional to the fraction of the overall population made up by the subgroup. If judgements are made to decide which elements from each subgroup and/or how many elements from each subgroup should be chosen to reflect population demographics, then this is known as QUOTA sampling. 3) Cluster Sampling – Dividing the population into (often naturally-occurring) clusters, and then randomly sampling individual clusters and including all elements from each cluster sampled (one-stage cluster sampling). If subsets of each randomly chosen cluster are then further randomly sampled, this is known as two-stage cluster sampling. 4) Systematic Sampling (1-in-m sampling) – Select the kth element randomly from the first m elements of the population, then select every mth element afterward (k + m, k + 2m, etc.). Simple random sampling is the most random sampling design, systematic sampling the least. Chapter 1 Definitions P. 3 Here are two non-random sampling designs. 5) Convenience Sampling – Sampling all easily reachable or accessible elements. 6) Judgement Sampling – The experimenter decides who will be in the sample. This sample is usually designed for performance of specific studies on specific groups. Note: All samples can have descriptive statistics run on them. However, the less random the sample, the less useful it is for making inferences about the population. EXPERIMENTAL DESIGNS: There are two basic types of experimental designs. 1) Observational studies – researchers simply observe characteristics and take measurements. 2) Designed experiment – researchers employ treatments and controls and then observe characteristics and take measurements (often done in cause-effect studies). -- Treatment = Experimental condition -- Response Variable = The variable that is the experimental outcome to be measured -- Factor = A variable whose effect on the response variable is of interest -- Levels = Possible values of a factor -- Treatment = A combination of levels of one or more factors -- Treatment Group = A group receiving a specified treatment (can be more than one treatment group, each receiving its own treatment) -- Control Group = A similar group not receiving any treatment, or receiving the baseline treatment. Principles of experimental design: 1) Control – A method used to control for effects due to factors other than the one(s) of interest (the process of Controlled Experimentation), so the effects of treatments can be determined. 2) Randomization – Elements sampled should be randomly divided into groups to avoid unintentional Selection Bias in constituting the groups – make the control and treatment groups as similar as possible. 3) Replication – A sufficient number of subjects should be used to ensure that randomization creates groups that closely resemble each other, to increase the chances of detecting differences among the treatments when such differences actually exist. Chapter 1 Definitions P. 4 Completely Randomized Design – All the experimental units are assigned randomly among all the treatments. Randomized Block Design – The experimental units are assigned randomly among all treatments separately within each block. Blind Experiment – None of the experimental units know which treatment they are receiving; minimizes potential response bias due to individual response/attitude toward their treatment. Double-Blind Experiment – Neither the experimental units nor the experimenters know which individuals are receiving each treatment; minimizes bias introduced by individual attitudes toward their treatments, experimenter attitudes toward individuals based on their treatments. EXAMPLE OF A DESIGNED EXPERIMENT: A study was done on weight loss due to combinations of caloric intake and daily exercise level. 480 men were sampled, each from similar backgrounds. All were required to check in to the same nutritional center, to get the same amount of sleep at night, to avoid junk food, eat three daily meals at proscribed times, and were found to have similar genetic traits (i. e. no specific medical conditions like hypothyroidism, etc.). All variables the experimental units were exposed to were controlled (held constant) except for daily caloric intake and minutes of daily exercise. Two factors were as a result created: 1) Daily caloric intake – 2 levels of this treatment were set up Daily caloric intake between 1500 – 2000 calories Daily caloric intake between 2000 – 2500 calories; 2) Daily exercise level – 3 levels of this treatment were set up No exercise 15 minutes of physical activity 30 minutes of physical activity. The result was that 6 treatments were set up: 1) Control Group (Daily caloric intake between 2000 – 2500 calories, No exercise); 2) Daily caloric intake between 2000 – 2500 calories, 15 minutes of physical activity; 3) Daily caloric intake between 2000 – 2500 calories, 30 minutes of physical activity; 4) Daily caloric intake between 1500 – 2000 calories, No exercise; 5) Daily caloric intake between 1500 – 2000 calories, 15 minutes of physical activity; 6) Daily caloric intake between 1500 – 2000 calories, 30 minutes of physical activity. Chapter 1 Definitions P. 5 This was obviously not a blind experiment from the perspective of amount of daily physical activity, so a potential bias exists – respondent’s motivation/attitude toward the program if they assume no exercise means less likely weight loss. The 480 respondents were randomly assigned to each of the 6 treatment groups, so selection bias is hopefully minimized. The size of each treatment group (80 men) also assures replication of results is likely, since there are an ample number of respondents per group to make it likely that each treatment group will show a number of similar results for weight loss.