Multistage Sampling Techniques PDF
Document Details
Uploaded by DetachableUranus
Tags
Summary
This document details multistage sampling techniques, including two-stage equal cluster sampling and stratified multi-stage sampling. It also provides an overview of sampling frames, different types of sampling frames, questionnaires, sample size estimation, data collection, and ethical considerations in research.
Full Transcript
Chapter 3 Multistage Sampling: Two-Stage Equal Cluster Sampling In multi-stage sampling: the sample is selected in stages, the sampling units in each stage being sub-sampled from the (larger) units chosen at the previous stage, with appropriate methods of selection of the...
Chapter 3 Multistage Sampling: Two-Stage Equal Cluster Sampling In multi-stage sampling: the sample is selected in stages, the sampling units in each stage being sub-sampled from the (larger) units chosen at the previous stage, with appropriate methods of selection of the units – simple random sample (with or without replacement), systematic, probability proportional to size etc. - being adopted at each stage. Cont’d… In other words, The universe is divided into a number of first-stage (or primary sampling) units, which are sampled; then the selected first-stage units are sub-divided into a number of smaller second-stage (or secondary sampling) units, which are again sampled: the process is continued until the ultimate sampling units are reached. Reasons for multi-stage sampling Multi-stage sampling is adopted in a number of situations: 1. Sampling frames may not be available for all the ultimate observational units in the universe, and it is extremely laborious and expensive to prepare such a complete frame. Here, multi-stage sampling is the only practical method. 2. Even when suitable sampling frames for the ultimate units are available for the universe, a multi- stage sampling plan may be more convenient than a single-stage sample of the ultimate units, as the cost of surveying and supervising such a sample in large-scale surveys can be very high due to travel, identification, contact, etc. This point is closely related to the consideration of cluster sampling. Cont’d… 3. Multi-stage sampling can be a convenient means of reducing response errors and improving sampling efficiency by reducing the intra-class correlation coefficient observed in natural sampling units, such as households or villages. For example, in a crop survey - villages are the first stage units, - fields within the villages are the second stage units and - plots within the fields are the third stage units Cont’d… Assuming that the clusters are homogeneous, two stages sampling generally allows more clusters to be selected in the first sampling stage, which may increase precision. Two stage sampling is generally more expensive than cluster sampling with the same sample size but less expensive than stratified random sampling. Mean and variance of two stage sampling Let the population consist of N clusters each consisting of the same number M of subunits. The population values are then 1 𝑌𝑖𝑗 , 𝑖 = 1,2, … , 𝑁 , 𝑗 = 1,2, … , 𝑀 𝑌ത = σ𝑁 σ 𝑀 𝑌 𝑁𝑀 𝑖=1 𝑗=1 𝑖𝑗 Is the population mean of 𝑖 𝑡ℎ 𝑓𝑖𝑟𝑠𝑡 𝑠𝑡𝑎𝑔𝑒 𝑢𝑛𝑖𝑡(𝑓𝑠𝑢) Stratified Multi-stage Sampling Stratified multi-stage designs are the commonest of all types of sample designs. This is because they combine, as regards costs and efficiency, the advantages of both stratification and multi-stage sampling. In stratified sampling, ideally the strata should be formed so as to be internally homogeneous and heterogeneous with respect to one another (with respect to the values of study variables in srs or with respect to the ratios of the values of study to the size-variable in pps sampling): this, as has been observed before, results in increased efficiency. On the other hand, in a multi-stage design the first-stage units should be internally heterogeneous and homogeneous with respect to one another. Cont’d… In a stratified multi-stage design, all the strata have of course to be covered, but when the strata are made internally homogeneous and the first-stage units internally homogeneous, a small number of first-stage units need be selected from each stratum in order to provide an efficient sample. Chapter 4 Preparation of Sampling Frames Definition sampling frame is a listing of the units from which the sample selection is to be made at any stage of sampling. The sampling frame should be an accurate representation of the population. The frame consists of materials, procedure, and devices that identify, distinguish, and allow access to the elements of the target population. It’s composed of a finite set of units to which the probability sampling scheme is applied. Rules or mechanisms for linking the frame units to the population elements are an integral part of the frame. Cont’d The frame also includes auxiliary information (measure of size, demographic information) used for special sampling techniques, such as stratification and probability proportional to size sample selections, or special estimation techniques, such as ratio or regression estimation. for surveys with multistage sample designs, a frame is needed for each stage of selection. Frames in Multi-stage Design: In multistage the sampling units used at the first stage of sampling are called primary sampling units (PSUs). In designs with three or more stages, units used for the intermediate stages are called secondary or second-stage sampling units (SSUs), third-stage sampling units, and so on. Those used at the final (ultimate) stage are called ultimate sampling units (USUs). For example, ~ for the three-stage design the sampling units for household survey are: PSUs: Districts (Woredas) SSUS: EA (kebeles) USUs: housing units (households) Any sampling frame used for the first stage of selection must cover the entire survey population (the designated PSUs). Types of Sampling Frames Area frames In area sampling frame the units are variously labelled as county; district; tract; woreda and village etc. List frames A list sampling frame is quite simply a frame made up of a list of the target population units. Master sample frames The master sample frame is designed and constructed to be a stable, established framework for selecting the sub samples that are needed for particular surveys or rounds of the same survey over an extended period of time. Clustered frames It may well happen that there is not a good population frame for the ultimate sampling units, or that the creation of such would be much too expensive. Desirable Properties of Frames The properties can be grouped in to three major categories: properties related to quality, those related to efficiency and those related to cost. 1. Quality Related Properties Quality related properties of frame are those which make it possible to minimize non- sampling errors, especially coverage error, that might occur because of deficiencies in the frame. Cont’d Desirable quality related properties are that the frame Consists of well-defined units well-defined units, meaning that area units have recognized boundaries that are clearly delineated on various types of maps, and for non-area units a precise standard definition of the unit be established. Units have adequate identifier: usually frame units will have both unique numerical identifiers (primary identifiers), and the other identifiers, such as names and addresses (secondary identifiers). Cont’d Must be complete: the completeness of a sampling frame deals with the extent to which the intended coverage is actually achieved. Are up-to date: periodic updating to ensure that they are up-to-date for some are likely to change with time. Must have stable units : with number, definition and size. 2. Efficiency Related Properties Efficiency related properties of frames are those that make possible and facilitate the use of efficient survey designs. Efficiency in this context refers to the relationship between sampling error and the cost of producing survey estimates. The most efficient survey design is the one that reduces sampling error with the possible lower cost. Cont’d Other properties of frame that facilitate the use of efficient sample designs include: Choice of sampling units available- organize the frame units in a hierarchical structure and assigning identifiers to sampling units. Good quality map of units available- showing the boundaries of each unit Easy to manipulate/ process- computerization of the frame 3. Cost Related Properties The preparation of sampling frames can be an expensive exercise. Low cost of frame development can best be achieved by treating the development, maintenance and updating of frames for census and household surveys as a single integrated ongoing process. The cost of frame preparation must be considered at the planning stage and must be budgeted. If two alternative frame sources would result in the same quality and efficiency, the one with lower cost of development, use and maintenance would obviously be preferred. Chapter 5 Sample design Sampling Methods o The general aim of all sampling methods is to obtain a sample that is representative of the target population. o When selecting a sampling method we need some minimal prior knowledge of the target population. o with this and some reasonable assumptions we can estimate a sample size required to achieve a reasonable estimate, with acceptable precision and accuracy, of population. o Sampling methods can be categorized according to the approach that take to the probability of a particularly unit being included. o Most sampling methods attempt to select units such that each has a definable probability of being chosen. Choice of Sample Design A sample design is a joint effort of the survey statistician and other experts such as subject matter specialists, data users, and survey executing agency. Mostly statisticians require information from other experts in order to propose a sample design that will meet the required specification of the users at the lowest possible cost. Among few issues on which they should discuss and reach agreement may include objectives of the survey, variables to be measured, type of estimates required, levels of reliability and validity needed for the estimates and any restriction placed on survey with respect to timeliness and costs. Setting Objectives and Preliminary Investigation of the Survey The survey objectives should be clearly specified and precisely stated at the outset. Other issues related to the objectives and relevant to the survey must be assessed at the early stage of the design A clear specification of the problem and the cause of the problem into a precise and definite statement Identification and definition of the population to be studied and a description of the coverage, such as geographic area, branch of the economic or social group, or other classification of the population covered by the survey. A clear specification of the desired information to be collected in statistical terms Cont’d The level of breakdowns by which the results are to be tabulated; regions, age groups, sexes, residences, and any other population covered by the classification. The level of accuracy desired or the specification of tolerable errors The kinds of result expected and, the users as well as the uses of the data Timeliness: how soon are the results needed; the utility of survey results falls off gradually with the passage of time following the data collection stage of the survey Sampling Plan: There are different ways of designing a sample survey, but the idea of optimum design started with the sampling plan features such as selection process and estimation procedures Cont’d Selection Process After making an assessment of survey objectives, the topic to be covered, description of coverage, and other issues, the next step in selection process is to make a choice of design. Choice of design: there are different designs of sample, which are likely to be appropriate for different types of survey, and in different circumstances Sample size calculation for estimation of population proportion The minimum sample size required, for a very large population is: n = (Zα/2) 2 p(1-p) / w2 In order to calculate the required sample size to study population proportion, you need to know the following facts: a. The reasonable estimate of the key proportion to be studied. ◦ If you don’t have any information about proportion, take it as 50%. ◦ if given in range, take the value closest to 50%. 31 n = (Zα/2) 2 p(1-p) / w2… cont’d b. The degree of accuracy required. ◦ That is, the allowed deviation from the true proportion in the population as a whole. ◦ It can be within 1% or 5%, etc. c. The confidence level required, usually specified as 95%. 32 Example 1 (Prevalence of diarrhea) Assume the following information is available p = 0.26 , w = 0.03 , Z = 1.96 ( i.e., for a 95% C.I.) Solusion: n = (1.96)2 (.26 ×.74) / (.03)2 = 821.25 ≈ 822 Thus, the study should include at least 822 subjects. 33 Example 2 A hospital administrator wishes to know what proportion of discharged patients are unhappy with the care received during hospitalization. If 95% Confidence interval is desired to estimate the proportion within 5%. Solution n = (Zα/2) 2 p(1-p) / w2 =(1.96)2(.5×.5)/(.05)2 =384.2 ≈ 385 patients N.B. If you don’t have any information about P, take it as 50%. 34 Sample Size Estimation The sample size for a survey must be decided upon at the planning stage, together with the sample design The sample size required depends upon three factors- The level of precision required in the estimate; This requires specifying the acceptable margin of error and the confidence level; the sample design to be used, in which different designs will produce different levels of precision for the same sample size. Cont’d The basis for calculating the size of samples is that there is a minimum sample size required for a given population to provide estimates with an acceptable level of precision. If the sampling process is carried out correctly, using an effective sample size, the sample will be representative and the estimates it generates will be useful Another area that needs attention in sample size determination is that several variables are equally important in a particular survey and the precision requirements for each of this will then produce a different estimate of the sample size needed. Estimation Procedure The estimation characteristics will be a major objective in surveys. Population estimates will be calculated from sample data and reported together with an indication of the precision of the estimate obtained from the sample variance. Calculation of population estimates are derived from the type of sampling design used for the survey. Based on the type of sampling design estimates are raised from an estimate of a small sample to an estimate of the population by multiplying by the inverse of the sampling fraction. Method of Data Collection Data Collection refers to a purposive gathering of information relevant to the subject matter of the study from the units under investigation. There are two types of data to be used in research study. Primary data refers to data collected either by or under the direct supervision and instruction of the researcher. Methods of primary data collection There are several methods of collecting primary data. Particularly the important ones are: Questionnaires , Interview method , and Observation method Cont’d Secondary data refers to data which are not originated by the researcher, but which he/she obtains from other sources. Any data that has been collected earlier for some other purpose are secondary data in the hands of an individual who is using them. Secondary data may either be published or unpublished data. Usually published data are available in: Various publications; Journals; Books; Historical documents and other sources of published information. Ethical consideration o Ethics is a systematic study of value concepts, ‘good’, ‘bad’, ‘right’, ‘wrong’ and the general principles that justify and applying these concepts. OR o Ethics focuses on the disciplines that study standards of conduct, such as philosophy, theology, law, psychology, or sociology. For example, a "medical ethicist" is someone who studies ethical standards in medicine. o In general, ethics is defined as a method, procedure, or perspective for deciding how to act for analyzing complex problems and issues. Codes and Policies for Research Ethics Many different professional associations, government agencies, and universities have adopted specific codes, rules, and policies relating to research ethics. The following is a rough and general summary of some ethical principal that various codes address: Honesty: Honestly report data, results, methods and procedures, and publication status. Do not fabricate, falsify, or misrepresent data. Do not deceive colleagues, granting agencies, or the public. Objectivity: Avoid or minimize bias or self-deception. Disclose personal or financial interests that may affect research Cont’d… Integrity: Keep your promises and agreements. Carefulness: Avoid careless errors and negligence; carefully and critically examine your own work and the work of your peers. Keep good records of research activities, such as data collection, and research design. Openness: Share data, results, ideas, tools, resources. Be open to criticism and new ideas. Non-Discrimination: Avoid discrimination against colleagues or any body on the basis of deferent factors that are not related to their scientific competence and integrity. Cont’d… Animal Care: Show proper respect and care for animals when using them in research. Do not conduct unnecessary or poorly designed animal experiments. Human study participants’ protection: When conducting research on human study participants minimize harms and risks and maximize benefits; respect human dignity, privacy, and autonomy; Research Ethics when Dealing with Human Participants All research involving human beings should be conducted in accordance with three basic ethical principles. Respect for persons: Their personal choices should be treated with respect for self determination; and which requires that those who are dependent or vulnerable be afforded security against harm or abuse. Beneficence: Refers to the ethical obligation to maximize benefits and to minimize harms. Justice: Refers to the ethical obligation to treat each person in accordance with what is morally right and proper, to give each person what is due to him or her. Informed consent Informed consent is a process by which a study participant voluntarily confirms his or her willingness to participate in a particular trial/study. The main goal of informed consent is to make sure that the study participant has understood and make choices freely whether to begin or continue participation in a study. The essential elements of informed consent are: information, comprehension and autonomy of study participants and consent. The information should consist of statement of objectives, voluntary participation and withdrawal, explanation of selection criteria, description of discomforts & risks, expected costs and benefit for participation and compensation in case of injury. In short it is a process which Who? When? How? Roles of ethical Review Board/Committee Ethical review committees may be created under levels of national or local representative bodies. The main responsibilities of an ethical review board/committee are: to determine that the proposed research is scientifically sound. to ensure that all other ethical concerns arising from a protocol are satisfactorily resolved both in principle and in practice. to consider the qualifications of the investigators, including education in the principles of research practice and to keep records of decisions and to take measures to follow up on the conduct of ongoing research projects CHAPTER 6 INSTRUMENTS OF DATA COLLECTION Type of Instruments A data collection instrument is a document used for gathering and recording of data in a survey. Basically there are two types of instruments to collect data: structured & unstructured questionnaire. structured questionnaire is one of the instruments used in data collection and which contains written questions that people respond to directly on the questionnaire from itself, with or without the aid of an interviewer. unstructured questionnaire (checklist of topics) used, mostly in qualitative survey, when enquiries are not appropriate for structured questionnaires. Cont’d… An unstructured questionnaire contains mostly open-ended questions. This type of instrument is used in an informal or exploratory survey. Most questionnaires used in sample survey combine structured and unstructured questions. Since questionnaire is the main data collection instrument in formal sample survey, this chapter will discuss the issues involved in questionnaire design and other activities related to it. Principles of Designing Questionnaire o All surveys involve presenting respondents with a series of questions to be answered. o The questions may be simple single-item measures or complex multiple-item scales. One major contributory element in the process of formal sample survey for maintaining data is quality the questionnaire design. In this approach, questionnaires need to be structured and its design is critical because ‘survey analysis depends on the completeness of the topics covered Error-free data transfer requires clear, comprehensive questions, good enumeration, and clearly set out answers. Much of this process depends on good questionnaire design. Cont’d… o In this respect there are some questionnaire design principles which links between interview and data processing. o Regarding the content one must include the minimum number of topics to meet the objectives. o Time for the interview is another factor that must be kept reasonable and this limits the number of questions. o The questions must be easy for the respondents to understand and to answer accurately and clearly. o The questionnaire should be easy to use as an interview guide for the enumerator and as an instrument for recording answers. Cont’d o The questionnaire should be self-contained, which include identification of the enumerator, respondent, date of interview and any other reference information such as geographical identification and other field details. o A typical sequence of activities to design a form would have the following pattern. Draw up a list of question topics from a mixture of theoretical models, empirical information, research evidence and terms of reference for the study; For each topic phrase, the specific information required; List them in a logical order, following either a chronological or sequential pattern; Cont’d Decide for each questions how to record the interview response; Make a first draft layout on the style of paper to be used; Test the design on model respondents; Prepare a pilot draft for a pilot or test survey; Modify the form results of the test; and Finalize the design and layout. Review as many times as possible the number of questions finally listed Type of Questions (Two basic types of questions can be used questionnaires: ) open-ended & closed-ended questions depending on the amount of freedom given to respondent in offering responses. The type of questions for use will be determined by the form of responses sought the nature of the respondents and their ability to answer the questions. Closed-ended Question It’s one where a predetermined list of alternate responses is presented to the respondent for checking the appropriate one(s). It implies that the respondents; answers are restricted in some way to a limited range of alternatives. It’s falls into one of two categories: dichotomous and multiple-choice question. Cont’d There are two categories of multiple-choice questions: single coded question, where the respondent is permitted to check one & only one response; & multi-coded question, allows the respondent to select as many responses that are applicable. o Do you have a bank account? Yes =1, No = 2 o How many children have you ever born? 1 = 1-2, 2 = 3-4 , 3 = 5-6, 4 = 7-8, 5 = >8 o Which type of soft drink(s) does your household consume? 1 = Pepsi-Cola, 2 = Coca-Cola, 3 = Miranda, 4 = Fanta, 5 = Sprite, 6= Seven-Up, 7 = Others, Specify-------- The choice can be made by making a mark alongside a category; by entering a numeric value; or by selecting a code from a code list. Cont’d… Setting categories of responses requires skill and experience in the areas of studies and suits computer processing. The advantages of closed response categories are that: It is easier and quicker for respondents to answer The answers of different respondents are easier to standardized and to compare The answers are easier to code and statistically analyse The questions meaning is often made more by the response categories, The answers are relatively complete as long as all relevant categories are specified Respondents are more likely to answers about sensitive topics. Cont’d The disadvantages of closed response are that: The respondent cannot guess at answers when they don’t know since have the categories to guide them The appropriate category may be missing from the schedule Failure to understand the question is less easily detected than with an open-ended question A poorly planned list may act as a constraint to correct answers not catered for Too few categories may fail to different between important groups, and enumerator error (placing the tick in the wrong box by accident will be more common) Cont’d Open-ended Question An open-ended (unstructured, free response) question is one which allows the respondent to answer it freely in his or her own words, and to express any ideas generated from the question itself. Open implies that the respondent is permitted to answer in any form and at any length without any limitation on the range or complexity of the answer, to the question asked. For example: ‘which crops do you grow’? The question does not specify any particular season or crops or plots and hence many answers are possible. It is open for discussion. Cont’d The advantages of open-ended responses are: The permit an unlimited number of possible answers which may not be consider at initial stage of the questions’ design. Respondent can answer in detail and can qualify and clarify responses by expressing in his/her own words. Unanticipated findings can be discovered. They permit creativity, self-expression, and richness of detail They may be used when there are too many response categories to list on a questionnaire They are useful when the questions are too complex to reduce to a few standard responses. Cont’d The disadvantages of open-ended responses are: That much irrelevant information is collected The answers are not standardized & are therefore difficult to compare and to make statistical analysis. Coding responses is difficult They require a higher level of skills on the part of the data collector since responses are written verbatim More time, thought, and effort is necessary for completion The forms are often bulky because answers take up a lot space in the questionnaire. Question Layout In questionnaire design, as a general principle, questions should be presented in a logical order, designed to follow a natural sequence. Four basic alternatives are found in the layout of questions. A verbatim listing of every question, with complete wording and instructions on the progression of the respondent through the form. A listing of questions in a specific order, but without full or precise wording of the questions, or instructions for progression through the form. A tabular row and column format in which spaces are indicated for response, usually in coded form, without any specification of questions. A checklist of topics, indicating key facts to be covered, but with answers recorded either in an unstructured way in a field notebook, or a simplified row/column table. Common Problems of Question Writing (Phrasing) o Another aspect of questionnaire design that needs serious consideration is phrasing of the question. o At each stage the question should have a clear meaning, the same meaning to every person asked and the researcher, an answer which the respondent knows, and an answer which can be given clearly and unambiguously by the respondent. Leading Questions o A leading question is one that leads the respondent to choose one response over another by its wording. The presentation of questions should be neutral. Cont’d Multiple Questions o Multiple (soluble-barreled) questions are questions which combine two or more distinct questions into one single question. Ambiguous Question o Ambiguity, confusion, and vagueness must be avoided from a question since different people will understand the question differently, and in effect their interpretation will depend on the indicial respondent. Use Simple Language o The language of a question should be simple. The aim in the question wording is to communicate with respondents as nearly as possible in their own languages. Cont’d Sensitive Topics o In some cultures people do not like to discuss private matters openly. Sensitive questions are apt to be irritating, threatening, or embarrassing to the respondent. o Such questions are prone to normative answer, answers which confirm that the respondent acts within the special rules of society even if that particular individual sometimes acts outside these rules. Choice of the Reference Period o During questionnaire design, the choice of appropriate time reference period is an extremely important consideration. o Time-reference period is the specified length of time for which the respondents is asked to give information about events occurring within it.