Some Basic Concepts of Sampling PDF

Document Details

Uploaded by Deleted User

Symbiosis International University

Dr. Madhulika Mishra

Tags

sampling statistics research methods social research

Summary

This presentation introduces basic sampling concepts in social research. It discusses the difference between a population and a sample, highlighting the advantages of using a sample in research. The presentation covers applications of sampling in new product development and customer satisfaction surveys.

Full Transcript

Some basic concepts of sampling Dr. Madhulika Mishra Symbiosis Statistical Institute Symbiosis International University Pune India Sampling Sampling consists of obtaining information from a larger group or a...

Some basic concepts of sampling Dr. Madhulika Mishra Symbiosis Statistical Institute Symbiosis International University Pune India Sampling Sampling consists of obtaining information from a larger group or a universe. A social researcher has to collect information about a universe that consists of vast, differentiated population spread over a large territory and that too with in a limited amount of time and money. Measuring or collecting information from each and every member of such a vast population is, therefore, always not possible. It is known that part of a whole can give sufficient dependable information if the procedures followed in collection the part has been scientific. Population and Sample A population or universe can be defined as any collection of persons or objects or event in which one is interested. For example, if we are studying the relationship between the class achievements of the university students and the methods of teaching then the students of the university will come under our population. If we are studying the voting behavior or political participation of the citizens of India then all the citizens of India will come under population. By sample we mean the aggregate of objects, persons or elements, selected from the universe. It is a portion or sub part of the total population. Example: In the above example for assessing the voting behavior of the citizens, instead of getting information from the whole population we can select a part of the population of the country and then can ask their opinion. Advantages Relatively low cost: Where conducting a census requires immense resources, a survey is much more cost-effective. Fast and Convenient: It’s much easier to examine a selected sample of the population than to take a census. The results are more accurate: If the sample is selected keeping in mind the right quality checks, then its possible to achieve highly accurate results Applications of Sampling New Product Development: If a business firm is interested in launching a new product. Then one of the philosophy could be to first listen to the market or it’s potential customers and then launch the product according to their needs and demand. Here sampling can be utilized by considering a few customers and analyzing their demands on views over the product or more precisely we can conduct a pilot survey. Customer Satisfaction: There are many business that are having hundreds of customers per day. And if we want to know about the level of customer satisfaction, then sampling the customer is one of the way. Here samples can be collected by some appropriate sampling techniques and hence inferences can be drawn without going to study the entire population. Sampling unit An element or a group of elements on which observations can be taken is called a sampling unit. The objective of the survey helps in determining the definition of sampling unit. Like if the objective is to determine the total income of all the persons in the household, then the sampling unit is household. Similarly if the objective is to decide/determine the income of any particular person in the household, then the sampling unit is the income of any particular person in the household. Representative sample A sample which contains all the salient features of the population. Example: if a population is having 30% males and 70% females, then a representative sample should have nearly 30% males and 70% females. If we take out a handful of wheat from a 100kg bag of wheat, we expect the same quality of wheat in hand as inside the bag. Sampling Frame List of all the units of the population to be surveyed constitutes the sampling frame. All the sampling units in the sampling frame have identification particulars. Example: All the students in a particular university listed along with their roll numbers constitute the sampling frame. Advantages of Sampling over Complete Enumeration Reduced cost and enlarged scope: Sampling involves collection of data on smaller number of units in comparison to complete enumeration, so the cost involved in the collection of information is reduced. Further additional information is collected at little cost in comparison to conducting another survey. Example- if an interviewer is collecting information on health conditions then he/she can also ask some questions on health practices. This will provide additional information on health practices and the cost involved will be much less than conducting an entirely new survey on health practices. Organisation of work It is easier to manage the organisation of collection of smaller number of units then all the units in a census. For example, in order to draw a representative sample from a state, it is easier to draw small samples from every city than drawing the sample from the whole state at a time. This definitely results in more accuracy in the statistical inferences because better organisations provide better data and in turn improved statistical inferences are obtained. Ways to ensure representativeness There are two ways to ensure that the selected sample is representative: Random sample or probability sample: The selection of units in the sample from a population is governed by the laws of chance or probability. The probability of selection of a unit can be equal as well as unequal. Non random sample or purposive sample: The selection of units in the sample from population is not governed by the probability laws. For example the units are selected on the basis of personal judgement of the surveyor. The process volunteering to take some medical test or to drink a new type of coffee also constitute the sample on non- random laws. Greater Accuracy The persons involved in the collection of data are trained personals. They can collect the data more accurately if they have to collect smaller number of units than large number of units in a given time. Urgent Information Required The data from a sample can be quickly summarised. For example, the forecasting of the crop production can be done quickly on the basis of a sample of data than collecting first all the observations. Feasibility Conducting the experiment on smaller number of units, particularly when the units are destroyed is more feasible. For example, in determining the list of bulbs, it is more feasible to use minimum number of bulbs. Similarly in any medical experiment, it is more feasible to use less number of animals. Types of Population Finite Population Infinite Population Finite Population If the number of objects or units in the population is countable, it is said to be a finite population. For example, the no of houses in a suburb is a finite population Infinite Population If the number of objects or units in the population is infinite, it is said to be an infinite population. For example, the number of stars in the sky forms an infinite population. In general, the population is denoted by Ω and its size is denoted by N. In the case of infinite population, N → ∞ Target Population A finite or infinite population about which we require information is called target population. For example, all 18 year old girls in the United States of America. Study Population This is the basic finite set of individuals we intend to study. For example, al 18 year old girls whose permanent address is in New York. Properties of a Population The population is properly defined so that there is no ambiguity as to whether a given unit belongs to the population. All the related and connected information related to the unit should be covered. For example, in a survey of achievement in mathematics, a researcher will have to define the population of students by age or by grade and, if necessary, he will also specify the type of schools, the geographical area and the academic year for which the data will be collected. The nature of the units in the population should be well defined: Inferences concerning a population cannot be drawn until the nature of the units that comprise it is clearly identified. If a population is not properly defined, a researcher does not know what units to consider when selecting the sample Characteristics of a Sample Sample Should be Unbiased It is a basic requirement for inferential research that the sample should be free from bias. In other words, it should be representative of the population. A representative sample is a sample which has all those characteristics present in the same amount or intensity in which they are found in the population. Well Defined Sampling Plan A well defined sampling plan has to be prepared. It means a plan which, if properly executed can guarantee that if we were to repeat a study on a number of different samples each of a particular size drawn from a given population, our findings would not differ from the findings that we would get if the given population as a whole was studied by more than specified proportion of sample. For example, if we have 100 samples, then out of 100 samples the sample value (estimate of value) will be correct 95 out of 100 samples. If the plan guarantees sufficiently well that the chances are great enough that the selected sample is representative of the population, it is known as a representative sampling plan. It ensures selecting diverse elements and making sure that these diverse elements are adequately represented in the sample. Principal Steps in a Sample Survey 1. Objective of the Survey The first step is to define in clear and concrete terms, the objectiveness of the survey. It is generally found that even the sponsoring agency is not quite clear in mind as to what it wants and how it is going to use the results. The sponsors of the survey should take care that these objectives are commensurate with the available resources in terms of money, manpower and the time limit required for the availability of the results of the survey. 2. Defining the population to be sampled The population that is the aggregate of objects (animate or in animate) from which sample is chosen should be defined in clear and unambiguous terms. 3. The frame and sampling unit The population must be capable of division into what are called sampling units for purpose of sample selection. The sampling units must cover the entire population and they must be distinct, unambiguous and non-overlapping in the sense that every element of the population belongs to one and only one sampling unit. 4. Data to be collected The data should be collected keeping in view the objectives of the survey. The tendency should not be to collect too many data which are rather irrelevant. 5. The Questionnaire or Schedule Having decided about the type of data to be collected, the next important part of the sample survey is the construction of the questionnaire which requires skill, special technique as well as familiarity with the subject-matter under study. The questionnaire should be brief, clear, non-offending and unambiguous and to the point so that not much scope of guessing is left on part of the respondent. 6. Method of Collecting information There are basically two methods: Interview method: In this method, the interviewer has to approach the respondent personally. He asks the question one by one an fill up the questionnaire according to the information supplied. Mailed questionnaire: In this method, the questionnaire is mailed to the respondents who are given a deadline to fill and send back the questionnaire to the interviewer. 7. Non Respondents: Quite often, due to practical difficulties, the data cannot be collected from all the sampled units. It can due to several reasons like….. 1. The respondent may be absent at the place when the investigator goes there 2. The respondent may refuse to answer the question. These incompleteness is called non-response. Proper sampling Procedures should be followed to deal with these kind of situations. 8. Selection of Proper Sampling Design The size of the sample, the procedure of sample selection and the estimation of population parameters should be properly defined. 9. Summary and Analysis of Data Scrutiny and editing of data: There should be initial quality checks to be done by the supervising staff while investigators are in the field. Also the schedules or the information recorded from a questionnaire should be checked and collected. The tabulation of data: Before tabulation, the data must be checked if is having any kind of non response or not. Proper code numbers for qualitative variables can be used while recording the data in a table. Statistical Analysis: After the data has been scrutinized, edited and tabulated, a careful statistical analysis is to be done. Errors The errors involved in the collection, processing and analysis of data can be broadly classified into two categories. Sampling errors Non sampling errors Sampling Errors These errors have their origin in sampling and arises due to the fact that only a part of the population i.e., sample has been used to estimate parameters and draw inferences about the population. As such the sampling errors are absent in complete enumeration. It arises due to many reasons: Faulty selection of the sample Some of the bias is introduced by the use of sampling technique for the selection of a sample e.g., in purposive sampling the investigator deliberately selects a representative sample to obtain certain results. This could be avoided by using some suitable random sampling technique. Substitution If difficulty arises in enumerating a particular sampling unit included in the random sample, the investigators usually substitute a convenient member of the population. This obviously leads to some bias since the characteristics processed by the substituted unit will usually be different from those possessed by the unit originally included in the sample. Faculty demarcation of sampling units Bias due to defective demarcation of sampling units is particularly significant in area surveys such as agricultural experiments in the field or crop cutting survey, etc. in such survey, while dealing with borderline cases, it depends more or less on the discretion of the investigator whether to include them in the sample or not. Due to improper choice of the statistics The statistic that we use for estimating the population parameter can introduce some bias. For example: Suppose we are having a random sample x1, x2, …, xn, then the sample variance i.e., is a biased estimate of the population variance but sample mean square is an unbiased estimate of population variance. It is observed that increase in the sample size usually results in the decrease in sampling error Non Sampling Errors These errors arises at the stage of observation, ascertainment and processing of the data and are thus present in both complete enumeration survey and the sample survey. Thus, the data obtained in a complete census, although free from sampling errors, would still be subject to non sampling errors whereas data obtained in a complete survey should be subject to both sampling and non-sampling errors. It occurs at every stage of the planning or execution of census or sample survey. Some of the major non-sampling errors arises due to the following reasons: 1. Faulty planning or Definitions: The planning of a survey consists in explicitly stating the objective of the survey. These objectives are then translated into (i) a set of definitions of the characteristic for which data are to be collected and (ii) into a set of specifications for collecting, processing and publishing. Here the non sampling errors can be due to: a. Data specification being inadequate and inconsistent with respect to the objective of the survey. b. Error due to location of the units and actual measurement of the characteristics, errors in recording the measurements, errors due to ill-designed questionnaires.etc c. Lack of trained and qualified investigators and lack of adequate supervisory staff. 2. Response Errors These errors are introduced as a result of the responses furnished by the respondents and may be due to any of the following reasons: 1. Response errors may be accidental: For example, the respondent may misunderstood a particular question and accordingly furnish improper information unintentionally.. 2. Prestige Bias: An appeal to the pride or prestige of person interviewed may introduce yet another kind of bias, called prestige bias by virtue of which he may upgrade his education, IQ, occupation, income etc or downgrade his age, thus resulting in wrong answers. 3. Self Interest:Quite often, in order to safeguard one’s self interest, one may give incorrect information, eg, a person may give an underestimate of his salary or production and an overestimate of his expenses or requirements.etc. 4. Bias due to interviewer: Sometimes the interviewer may affect the accuracy of the response by the way he asks question or records them. The information obtained on suggestions from the interviewer is very likely to be influenced by interviewer’s belief and prejudices. 5. Failure of respondent’s memory: One source of error which is common to most of the methods of collecting information is that or recall. Many of the questions in surveys refer to happenings or conditions in the past and there is a problem both of remembering the event and associating it with the correct time period. 3. Non Response Biases It occurs when full information is not obtained on all the sampling units. It can occur if the respondent is not found at home even after repeated calls or of he/she is unable to furnish the information on all the questions or if he/she refuses to answer certain questions. Therefore some bias is introduced as a consequence of the exclusion of the section of the population with certain peculiar characteristics, due to non-response. 4. Errors in Coverage IIf the objective of the survey is not precisely stated in clear cut terms. This may result in: (i) the inclusion in the survey of certain units which are not to be included or (i) the exclusion of certain units which were to be included in the survey under the objective. For example, in a census to determine the number of individuals in the age group say 20 years to 50 years , more or less serious errors may occur in deciding whom to enumerate unless particular community or area is not specified and also the time at which the age is to be specified. 5. Compiling errors Various operations of data processing such as editing and coding of the responses, tabulation and summarising the original observations made in the survey are a potential source of error. Compilation errors are subject to control through verification, consistency check. 6. Publication Errors Publication errors i.e., the errors committed during presentation and printing of tabulated results are basically due to two sources. The first refers to the mechanics of publication-the proofing error. The other which is of more serious nature, lies in the failure of the survey organisation to point out the limitations of the statistics. In a sample survey, non-sampling errors may also arise due to defective frame and faulty selection of sampling units. It is obvious that the non-sampling errors are likely to be more serious in a complete census as compared to a sample survey since in a sample survey the non-sampling errors can be reduced to a greater extent by employing qualified, trained and experienced personnel, better supervision and better equipment for processing and analysing relatively smaller data as compared to a complete census. Usually sampling error decreases with increase in sample size. On the other hand, as the sample size increases, the non- sampling error tends to increase. Accordingly as sample size increases, the behavior of non-sampling error is likely to be opposite to that of sampling error. Quite often, the non sampling error in a complete census is greater than both the sampling and non sampling errors taken together in a sample survey. Obviously in such situations sample survey is to be preferred to complete enumeration survey. Types of sampling Probability Sampling Non-probability sampling Probability sampling Probability sampling is defined as a sampling technique in which the researcher chooses samples from a larger population using a method based on the theory of probability. The most critical requirement of probability sampling is that everyone in your population has a known and equal chance of getting selected. For example, if you have a population of 100 people, every person would have odds of 1 in 100 for getting selected. Probability sampling gives you the best chance to create a sample that is truly representative of the population. Types Simple Random Sampling Stratified Random Sampling Systematic Sampling Cluster Sampling Simple Random Sampling It is the technique of drawing a sample such that each and every unit of the population has an equal and independent chance of being included in the sample. In this method, an equal probability of selection is assigned to each unit of the population at the first draw. And it also implies equal probability of selecting any unit from the available units at the subsequent draws. Merits Since the sample units are selected at random giving each unit an equal chance of being selected, the element of subjectivity or personal bias is completely eliminated. As such a simple random sample is more representative of the population. Demerits The selection of a simple random sample requires and up-to-date frame.i.e., a completely catalogued population from which samples are to be drawn. Frequently ,it is virtually impossible to identify the units in the population before the sample is dawn and this restricts the use of simple random sampling techniques A simple random sample may result in the selection of the sampling units which are widely spread geographically and in such a case the cost of collecting the data may be much in terms of time and money. Stratified Sampling Quite often we come across populations which are heterogeneous w.r.to the characteristic under study. In such cases we divide the whole population into relatively homogeneous sub-groups called strata. From the given stratum, a sample of given size is drawn using any sampling design. The ultimate sample is the combination of sample units from each and every stratum. Stratification means division into layers. Auxiliary information (past data or some information) related to the character under study may be used to divide the population into various groups such that units within each group are as homogeneous as possible. Merits In an unstratified random sample some of the strata may be over represented and some of them may be under-represented while some may be excluded altogether. This sampling scheme ensures any desired representation in the sample of the various strata in the population. Thus it provides a more representative cross section of the population and is frequently regarded as the most efficient system of sampling. It provides estimates with increased precision. Some problems The main problems while dealing with stratified random sampling are: Principle of stratification i.e., proper classification of population into various strata. Given the total sample size, how to allocate it amongst different strata. The value of k, the number of strata. Etc… Systematic Sampling It is a commonly employed technique if the complete and up-to-date list of sampling units is available. This consists in selecting only the first unit at random, the rest being automatically selected according to some predetermined pattern involving regular spacing of units. Let us suppose N sampling units are numbered from 1 to N in some order and a sample of size n is to be drawn such that N=nk; here k is usually called the sampling interval, is an integer. It consists in drawing a random number , say i

Use Quizgecko on...
Browser
Browser