Lecture Notes on Theory Sampling - STAT 443 - PDF

Document Details

WellManagedWerewolf7907

Uploaded by WellManagedWerewolf7907

University of Ghana

2024

Winnie M. Onsongo

Tags

sampling theory statistics survey methods lecture notes

Summary

These lecture notes cover the theory of sampling for a course (STAT 443). They discuss pre-requisites, assessment, the goals and objectives of the course, and introduce different types of surveys, including considerations for data collection and analysis. The document is delivered in a slide-presentation format.

Full Transcript

Dr. Onsongo W. Lecture Notes STAT 443: THEORY SAMPLING Lecturer: Winnie M. Onsongo (PhD) Department of Statistics and Actuarial Science University of Ghana wons...

Dr. Onsongo W. Lecture Notes STAT 443: THEORY SAMPLING Lecturer: Winnie M. Onsongo (PhD) Department of Statistics and Actuarial Science University of Ghana [email protected] November 5, 2024 1/1 Dr. Onsongo W. Pre-requisites Introduction to statistics and probability Basic algebra Research methodology Assessment and Grading The grade distribution is as follows: Assignments + Interim Assessment (30%); Exam (70%) Homework should be submitted on time. Late submission will not be accepted. Duplicate solutions will not be graded. 2/1 Dr. Onsongo W. Goal: This module will focus on the quantitative design and analysis aspects of household surveys. Course Objective: 1. Discuss some theoretical and practical considerations required for the survey data 2. Discuss strategies for conducting a preliminary analysis of a large-scale, complex survey 3. Develop data handling skills to prepare data for analysis 3/1 Introduction Dr. Onsongo Sample survey methods is a branch of statistics that W. deals with the principles and methods of collecting and analysing data from finite populations. This study contains many aspects that involve setting up the appropriate statistical principles and constructing suitable statistical methods for collecting and analysing data from finite populations. Society utilizes published and broadcast reports of sample surveys which aim to describe the world we live in. In such surveys, samples are drawn from the population and are used to reflect on the population they claim to represent. We need to express in quantitative terms all aspects of our lives for no argument is complete without figures to back it up. 4/1 Introduction continued Dr. Onsongo W. For example, we can say ’student enrollment at the University of Ghana has increased by 11% over the last one year’ or ’37% of workers in Accra spend more than one hour of the working day to travel to and from work’. The presentation of such figures is designed to keep us informed of the situation in our environment and is used to support some proposals on improvement of amenities like infrastructure or at least to place a discussion in a proper perspective. There is responsibility on those who present such statistical data , to do so fairly and objectively with no ill intent, and to provide sufficient details on the source, scope and method of data collection for proper interpretation and further analysis. 5/1 Sample Surveys and Opinion Polls Dr. Onsongo W. Fundamental to our study is the idea of finite population with individuals in the population having certain measures of interest. We may be able to derive the exact value of such characteristic by studying every individual in the population. However, most often limited resources dictate that we should estimate the characteristic by studying some smaller group of individuals in the population and infer the value of the characteristic from the information provided for by the sample. 6/1 Sample Surveys and Opinion Polls Dr. Onsongo W. Often, information gathered on individuals is quantitative and factual describing social or economic characteristics. It is assumed that data arise as independent observations from a population according to some probability model. In survey sampling, there is a fixed, determined finite set of individuals to be observed. When surveying human populations, the information gathered may include individual views or preferences. In this case the survey is referred to as an opinion poll. If concerned with qualities of products, then sample surveys can be of help in fields of market research. 7/1 Definitions Dr. Onsongo W. Finite population: A collection of units like households, people, cities, countries e.t.c Census: A complete enumeration of units in the population. Population: This is a collection of all the sampling units in a given region at a particular point of time or a particular period. Sampling unit: An element or a group of elements on which the observations can be taken. For example, If the objective is to determine the income of any particular person in the household, then the sampling unit is the income of that particular person in the household. 8/1 Definitions Dr. Onsongo W. Representative sample: When all the salient features of the population are present in the sample, then it is called a representative sample Sampling frame: A list of all the units of the population to be surveyed e.g all the students in a particular university listed along with their registration numbers make up a sampling frame. Types of sampling frames include i. Static/exhaustive list where a single list contains all sample frame units. This list exists prior to the start of the study ii. Dynamic list which is generated together with the sample e.g. all patients visiting a general practitioner during the coming year. 9/1 Why sample? Dr. Onsongo W. Sampling allows one to obtain a representative picture about the population, without studying the entire population. For a variety of reasons, study of the population is restricted to sampling some of its members and using the information gained to infer the characteristics of the population as a whole. Cochran (1977) gave four reasons why sampling is preferred to complete enumeration 10/1 Why sample? Dr. Onsongo i. Reduced cost: Expenditure is minimal for data secured W. from a section of the population. ii. Greater speed: Sampling approach allows data to be collected and processed quickly compared to a complete count. iii. Greater accuracy: A smaller scale of sample surveys means that greater effort will be put into ensuring that qualified personnel can be employed and given intensive training and supervision. More effort will also be put into quality control when processing data. iv. Greater scope and flexibility: Sampling ensures that greater effort can be invested in data collection for each sampled unit. The samples can then be used to collect data that would be difficult to measure via complete enumeration. 11/1 Types of surveys Dr. Onsongo There are various types of surveys that can be conducted on W. the basis of the objectives to be fulfilled. a. Demographic surveys: These are surveys conducted to collect demographic data e.g. household surveys, family size, number of males in families etc. b. Educational surveys: These surveys are conducted to collect the educational data e.g. how many children go to school, how many persons are graduate, etc. c. Economic surveys: These surveys are conducted to collect economic data e.g. industrial production, consumer expenditure etc. Such data is helpful in constructing the indices indicating overall economic growth of the country. 12/1 Types of surveys Dr. Onsongo W. There are various types of surveys that can be conducted on the basis of the objectives to be fulfilled. d. Public polls and surveys: These surveys are conducted to collect the public opinion on any particular issue. e. Marketing surveys: These surveys are conducted by major companies, manufacturers or service providers to collect the data related to marketing. Such data is used to identify any emerging needs of consumers that may require attention such as product enhancement, development or even introducing a new product into the market. 13/1 Principal steps in a survey Dr. Onsongo W. 1. Objectives of the survey have to be clearly defined and well understood by the person planning to conduct it. 2. Population to be sampled: This is determined by the objectives of the survey e.g in an educational survey seeking to know the transition rates of pupils from basic school to junior high school, pupils will constitute the population to be sampled 3. Data to be collected: Decide on which data will be relevant for fulfilling the objectives of the survey and note that no essential data is omitted. Asking too many questions that are never utilized lowers the quality of the responses thus resulting in lower efficiency in the statistical inferences. 14/1 Principal steps in a survey Dr. Onsongo W. 4. Degree of precision required: Results of any sample survey are always subjected to some uncertainty which can be reduced by taking larger samples or using superior instruments. This involves more cost and more time. So it is very important to decide about the required degree of precision in the data. 5. Method of measurement: The choice of measuring instrument and the method to measure the data from the population needs to be specified clearly e.g if the data has to be collected through interview, questionnaire, personal visit, combination of any of these approaches, then the forms in which the data is to be recorded need to be prepared accordingly. 15/1 Principal steps in a survey Dr. Onsongo W. 6. The sampling frame: The frame must cover the whole population and the units must not overlap each other in the sense that every element in the population must belong to one and only one unit. 7. Selection of sample: The size of the sample needs to be specified for the given sampling plan. This helps in determining and comparing the relative cost and time of different sampling plans. 8. The Pre-test: It is advisable to try the questionnaire and field methods on a small scale. This may reveal some troubles and problems beforehand which the surveyor may face in the field in large scale surveys. 16/1 Principal steps in a survey Dr. Onsongo W. 9. Organization of the field work: Focus is on how to conduct the survey, handle business administrative issues, providing proper training to surveyors, procedures, plans for handling the non-response and missing observations. Procedure for checking the quality of return should be given. It should be clarified how to handle the situation when the respondent is not available. 17/1 Principal steps in a survey Dr. Onsongo W. 10. Summary and analysis of data: Based on the objectives of the data, the suitable statistical tool is decided which can answer the relevant questions. The tabulating procedures, methods of estimation and tolerable amount of error in the estimation needs to be decided before the start of survey. Different methods of estimation may be available to get the answer of the same query from the same data set. Therefore the data needs to be collected in a manner that is compatible with the chosen estimation procedure. 18/1 Methods of data collection Dr. Onsongo 1. Physical observations and measurements: The W. surveyor contacts the respondent personally through the meeting. He observes the sampling unit and records the data. 2. Personal interview: The surveyor is supplied with a well prepared questionnaire. The surveyor goes to the respondents and asks the same questions mentioned in the questionnaire. 3. Mail inquiry: The well prepared questionnaire is sent to the respondents through postal mail, e-mail, etc. The respondents are requested to fill up the questionnaires and send it back. The questionnaires are accompanied by a self addressed envelope with postage stamps to avoid any non-response due to the cost of postage. 19/1 Methods of data collection Dr. Onsongo W. 4. Web based inquiry: The survey is conducted online through internet based web pages. Various websites provide such facility. The questionnaires are to be in their formats and the link is sent to the respondents through email. By clicking on the link, the respondent is brought to the concerned website and the answers are to be given online. 5. Registration: The respondent is required to register the data at some designated place. For example, the number of births and deaths along with the details provided by the family members are recorded at city municipal office. 6. Transcription from records: The sample of data is collected from the already recorded information. 20/1 Methods of data collection Dr. Onsongo W. The following points should be noted when interviewing respondents The interviewer must not influence the response. The interviewer has to ensure that the question is answered with the highest possible accuracy. A good question has to fulfill the following properties 1. It has to to be possible to ask the question as formulated 2. It has to be possible to formulate and answer the question without having to amplify on it. 3. If amplification is necessary nevertheless, standardized procedures must exist as to how this should take place 21/1 How to sample Dr. Onsongo The main goal is to draw a sample that is a fair W. representation of the population that leads to estimates of population characteristics with as great accuracy as possible. Various intuitively appealing methods of sampling have been utilized. Examples of such ad-hoc methods include: 1. Accessibility/haphazard sampling: A sample is chosen mainly because of its ease of access. We take the most easily obtainable observations. Shortcomings in terms of lack of representativeness are common. 2. Judgmental/Purposive sampling: The surveyor recognizes that the target population can contain different types of individuals with different characteristics and ease of access. The surveyor exercises an intentional subjective choice in drawing a ’representative sample’. 22/1 How to sample Dr. Onsongo W. 3. Quota sampling: This method is much more structured than haphazard or judgmental sampling. A well defined methodology is used to determine the numbers needed in each of the quotas with the aim of filling the quotas on a randomized basis. The major criticism of haphazard and judgmental sampling is that their results are unconvincing because there is no guiding principle that can be used when determining representativeness or access to accuracy of estimators based on such sampling principle. We therefore introduce an element of ’randomness’ into the sampling procedures and to draw our samples according to some probability mechanism. 23/1 Central concept of probability sampling Dr. Onsongo W. Of interest will be the population characteristics defined with reference to Y. Some of these population characteristics include: PN 1. The population total, YT = i=1 Yi 1 PN YT 2. The population mean, Ȳ = Yi = N i=1 N 3. The proportion P of members of the population which fall into some category of classification for the measure Y e.g in a survey of the performance of students, P may be the proportion of students who scored more than 60% in a recent test. The goal of sample survey is to estimate at least one of these characteristics from the information contained in a sample of n ≤ N drawn from the population. 24/1 Probability sampling Dr. Onsongo W. In any probability sampling scheme, we start by specifying the sample size n, of the sample to be drawn. Then consider all possible samples of size n that could be drawn from the population i.e S1 , S2 ,... The probability sampling scheme is defined by assigning a probability πi to each Si and a particular sample S can then be drawn according to this probability scheme. Different probability sampling schemes are possible corresponding to different probability distributions over the set of possible samples S1 , S2 ,... 25/1 Probability Sampling methods Dr. Onsongo W. Some of the probability sampling methods and their rationale include Simple random sampling: This is the most basic method of probability sampling and often acts as a basis of other probability sampling schemes. Systematic sampling: Chosen to increase precision and/or to ensure sampling with certainty for a subgroup of units. Stratification: Performed: ▷ To increase precision of population-level estimates ▷ To allow for estimation at sub-population level 26/1 Selection Probability Dr. Onsongo W. This is the probability that a unit in the population is included in the sample It should be known or estimable (consistently) It does not have to be constant. The selection probability may not be known prior. Therefore it is sufficient to know or estimate it as at the time of analysis. This is common with dynamic lists e.g in trying to estimate the number of patients likely to visit a specialist during the coming year, by asking ”How frequently have you visited the doctor during the last one year” 27/1 Selection Probability Dr. Onsongo W. At times, external factors, such as initiatives by respondents, influence the chance of being included as such, the integrity of the study is in jeopardy. To each member of the sampling frame we attach a non-zero probability of being selected. Then use probabilistic techniques to draw the sample for survey. 28/1 Notations Dr. Onsongo W. Conventional notations Quantity Population Sample Size N n Unit index I i Value for a unit YI yi Average Ȳ ȳ Total YT yt Total estimated from sample ŷ 29/1 Sampling from a finite population Dr. Onsongo W. Consider an artificial population: P = {3, 5, 10, 14} The listing of this population can be done as follows: I YI 1 3 2 5 3 10 4 14 I = 1, 2, 3, 4 N=4 30/1 Sampling from a finite population Dr. Onsongo W. A sampling mechanism will assign to each member of the population, a non-zero probability of being selected. These probabilities are necessary to: i. Study the properties of a sampling methods ii. Conduct estimation and statistical inference The samples can be drawn in two possible ways. i. The sampling units are chosen without replacement in the sense that the units once chosen are not placed back into the population. ii. The sampling units are chosen with replacement in the sense that the chosen units are placed back in the population 31/1 Sampling from a finite population Dr. Onsongo W. Sample of size n = 1 Enumeration: S1 = { {3}, {5}, {10}, {14} } S =4 s = 1, 2, 3, 4 Sample of size n = 2, when sampling with replacement: Enumeration: S2 = { {3, 5}, {3, 10}, {3, 14}, {5, 10}, {5, 14}, {10, 14}, {5, 3}, {10, 3}, {14, 3}, {10, 5}, {14, 5}, {14, 10}, {3, 3}, {5, 5}, {10, 10}, {14, 14} } S = 16 s = 1, 2,... , 16 32/1 Characteristics of the Population Dr. Onsongo Population average: W. 4 1X Ȳ = YI = 8 4 I =1 Population variance: 4 1 X σY2 = (Y − Ȳ )2 4−1 I =1 (3 − 8)2 + (5 − 8)2 + (10 − 8)2 + (14 − 8)2 = = 24.6667 3 Population total 4 X Y = YI = 3 + 5 + 10 + 14 = 32 33/1 China Town Dr. Onsongo W. 34/1 China Town Dr. Onsongo W. N=8 I = 1, 2,... , N = 8 Suppose we have two variables under consideration i.e. XI : number of building blocks on lot L. YI : number of dwellings in the buildings on lot L 35/1 China Town Dr. Onsongo W. Listing of China Town I XI YI 1 1 1 2 3 2 3 4 3 4 6 4 5 7 5 6 8 6 7 10 7 8 11 8 36/1 China Town Dr. Onsongo Population totals: X = 50, Y = 36 W. There are 50 lots, 36 dwellings, hence 14 empty lots. Population averages: X̄ = 6.25, Ȳ = 4.50 Population variances: 8 1 X σX2 = (XI − 6.25)2 = 11.9286 8−1 I =1 8 1 X σY2 = (YI − 4.50)2 = 6.0 8−1 I =1 The ratio of the number of dwellings to the number of lots: YT Ȳ R=π= = = 0.72 XT X̄ 37/1 Proportion Dr. Onsongo A proportion can be considered the average of a random W. variable: Define the (related, but different) population of all lots: I = 1, 2,... , 50 Let ( 1, if lot I is build upon ZI = 0, if lot I is empty Then, 50 X Z= ZI = 36 I =1 50 1 X Z̄ = ZI = 0.72 50 I =1 38/1 Population quantities Dr. Onsongo W. Population covariance: N 1 X σXY = (XI − X̄ )(YI − Ȳ ) N I =1 N 1 X SXY = (XI − X̄ )(YI − Ȳ ) N −1 I =1 Population correlation: σXY SXY ρXY = = σX σY SX SY 39/1 Sampling mechanisms Dr. Onsongo W. Recall that a population P with N members gives rise to a meta-population S of S samples. A sampling mechanism assigns a probability Ps (s = 1, 2,... , S) to each sample. Obviously, to be valid, the Ps must satisfy: Ps ≥ 0, for all s = 1, 2,... , S PS s=1 Ps = 1 40/1 Sampling mechanisms Dr. Onsongo W. For the artificial population with n = 2 s Sample Probability 1 {3,5} P1 2 {3,10} P2 3 {3,14} P3 4 {5,10} P4 5 {5,14} P5 6 {10,14} P6 7 {3,3} P7 8 {5,5} P8 9 {10,10} P9 10 {14,14} P10 41/1 Sampling with Equal Probabilities Dr. Onsongo W. The simplest mechanism is to assign the same selection probability to each individual. There are two versions: Without Replacement: Every individual can enter the sample at most once. With Replacement: Every individual can enter the sample multiple times; precisely between 0 and n times. Both give rise to Simple Random Sampling (see later) 42/1 Sampling with Equal Probabilities Dr. Onsongo For the artificial population with n = 2 W. Ps s Sample Without With 1 {3,5} 1/6 2/16 2 {3,10} 1/6 2/16 3 {3,14} 1/6 2/16 4 {5,10} 1/6 2/16 5 {5,14} 1/6 2/16 6 {10,14} 1/6 2/16 7 {3,3} 0 1/16 8 {5,5} 0 1/16 9 {10,10} 0 1/16 10 {14,14} 0 1/16 43/1 Sampling with Equal Probabilities Dr. Onsongo W. Selection without replacement sets the selection probability for for all samples with replication to 0. Under sampling with replacement, the heterogeneous samples are twice as likely to be selected as the homogeneous samples. The reason is that, for example, {3, 5}, can be selected in the order (3,5) and (5,3). In contrast, {5, 5}, comes into being in only one way. In general, probability depends on the number of permutations a sequence can have. 44/1 Points to note Dr. Onsongo W. It is important that samples be taken in a totally random fashion (or the closest approximation to it that one can accomplish in practice). Classical, historic models: i. Balls drawn from an urn (e.g. lotto games) ii. Tossing of dies Modern, realistic model: computerized pseudo-random generators. 45/1 Sample quantities Dr. Onsongo W. Sample fraction: n f = N This quantity is relevant only in finite population. Carefully distinguish between three quantities: Population quantity: a quantity, computed using all N population units. Sample quantity: the same quantity, computed using the n units selected into the sample. Estimate: an “approximation” of the population quantity, using only the n sample units. 46/1 Some Observations Dr. Onsongo W. When is an estimator good? To answer this question, we study characteristics of the estimators, i.e. the column of estimates. The quantities commonly used are: Expectation. Variance (precision), leading to the standard error. Bias. Mean square error. 47/1 Dr. Onsongo W. THANK YOU 48/1

Use Quizgecko on...
Browser
Browser