Business Research Methods (BRM) Session 4 PDF
Document Details
Indian Institute of Management, Tiruchirappalli
Dr. Gopinath Krishnan
Tags
Summary
This document provides an overview of business research methods, specifically focusing on sampling techniques and data preparation. It describes the difference between sampling and census, outlining various types of sampling methodologies, and detailing application examples.
Full Transcript
Business Research Methods Session 4 – Sampling and Data Preparation Dr. Gopinath Krishnan Assistant Professor Information Systems and Analytics Indian Institute of Management Tiruchirappalli Sampling vs. Census Populati...
Business Research Methods Session 4 – Sampling and Data Preparation Dr. Gopinath Krishnan Assistant Professor Information Systems and Analytics Indian Institute of Management Tiruchirappalli Sampling vs. Census Population The population is the entire group of individuals or items that you want to study. Example: All employees in a company. Population Parameter A population parameter is a numerical value that describes a characteristic of the entire population. Example: The average age of all employees in the company. Census A census involves collecting data from every member of the population. Example: The India population Census collects data from every resident in the country. Sample: A sample is a subset of the population selected for study. Example: Surveying 200 out of 2,000 employees in a company. Statistics: Statistics are numerical values calculated from a sample, used to estimate population parameters. Example: The average age of the 200 employees in your sample. Sampling vs. Census Blank Conditions Favoring the Use of Blank Sample Census 1. Budget Small Large 2. Time available Short Long 3. Population size Large Small 4. Variance in the characteristic Small Large 5. Cost of sampling errors Low High 6. Cost of nonsampling errors High Low 7. Nature of measurement Destructive Nondestructive 8. Attention to individual cases Yes No Sampling Sampling is the process of selecting a subset of individuals, items, or observations from a larger population to estimate characteristics of the entire population. Rather than examining every member of the population (which can be costly and time-consuming), researchers and analysts use sampling to make inferences about the whole group. Example: Imagine you're working for a company that wants to understand customer satisfaction. Instead of surveying all 10,000 customers, you might survey a sample of 500 customers. The insights gained from these 500 customers will help you understand the satisfaction levels of the entire customer base. Where are samples are effectively used? In market research, companies use sampling to gather opinions from a small group of consumers to predict the preferences of a broader market. Health researchers use sampling to study the prevalence of diseases in different communities. In Quality Control, manufacturers often sample a few products from a production batch to assess quality, rather than inspecting every single item. Pollsters sample voters to predict election outcomes or understand public opinion on various issues. Sampling Design Process Define the Target Population The target population is the collection of elements or objects that possess the information sought by the researcher and about which inferences are to be made. The target population should be defined in terms of 1. Elements 2. Sampling units 3. Extent 4. Time Define the Target Population Element An element is the object about which or from which the information is desired, e.g., the respondent. A female customer aged 18 who purchases Revlon lipstick. Sampling Unit A sampling unit is the unit from which data is collected and may include one or more elements. A group of female customers aged 18-35 who purchase Revlon lipstick. Extent Extent refers to the scope or boundaries of the population or sample, including geographical and temporal aspects. The extent of a study might be all female customers aged 18-35 who purchase Revlon lipstick across the United States over the past year. Time Time refers to the specific period during which data is collected or analyzed. A survey on customer satisfaction with Revlon lipstick conducted among female customers aged 18-35 over a six-month period. Determinants of sample size Important qualitative factors in determining the sample size are: the importance of the decision the nature of the research the number of variables the nature of the analysis sample sizes used in similar studies completion rates resource constraints sample size used in Research Studies Type of Study Minimum Size Typical Range Problem identification research (e.g., 500 1,000–2,500 market potential) Problem-solving research (e.g., pricing) 200 300–500 Product tests 200 300–500 Test-marketing studies 200 300–500 TV/radio/print advertising (per commercial 150 200–300 or ad tested) Test-market audits 10 stores 10–20 stores Focus groups 2 groups 6–15 groups Determine the Sampling Frame A sampling frame represents the elements of the target population, either as a list or set of directions for identifying them. Examples include telephone books, industry directories, commercial mailing lists, city directories, or maps. If a list cannot be compiled, alternative identification methods like random digit dialling procedures may be used. Sampling frames can lead to errors by omitting relevant elements or including irrelevant ones. Addressing sampling frame error can be done by redefining the target population to match the sampling frame, screening respondents during data collection. Classification of Sampling Techniques Convenience Sampling Convenience sampling attempts to obtain a sample of convenient elements. Often, respondents are selected because they happen to be in the right place at the right time. Use of students, and members of social organizations Mall intercept interviews without qualifying the respondents “People on the street” interviews A Graphical Illustration of Convenience Sampling Judgmental Sampling Judgmental sampling is a form of convenience sampling in which the population elements are selected based on the judgment of the researcher. Test markets Purchase engineers selected in industrial marketing research Department stores selected to test a new merchandising display system Expert witnesses used in court A Graphical Illustration of Judgmental Sampling Quota Sampling Quota sampling may be viewed as two-stage restricted judgmental sampling. The first stage consists of developing control categories, or quotas, of population elements. In the second stage, sample elements are selected based on convenience or judgment. Blank Population Composition Sample Composition Blank Control Characteristic Percentage Percentage Number Sex Blank Blank Blank Male 48 48 480 Female 52 52 520 Blank 100 100 1,000 A Graphical Illustration of Quota Sampling Snowball Sampling In snowball sampling, an initial group of respondents is selected, usually at random. After being interviewed, these respondents are asked to identify others who belong to the target population of interest. Subsequent respondents are selected based on the referrals. Market Research in Niche Industries Understanding Consumer Behavior in Unregulated Markets A Graphical Illustration of Snowball Sampling Simple Random Sampling Each element in the population has a known and equal probability of selection. Each possible sample of a given size (n) has a known and equal probability of being the sample actually selected. This implies that every element is selected independently of every other element. A Graphical Illustration of Simple Random Sampling Systematic Sampling The sample is chosen by selecting a random starting point and then picking every ith element in succession from the sampling frame. The sampling interval, i, is determined by dividing the population size N by the sample size n and rounding to the nearest integer. For example, there are 100,000 elements in the population and a sample of 1,000 is desired. In this case the sampling interval, i, is 100. A random number between 1 and 100 is selected. If, for example, this number is 23, the sample consists of elements 23, 123, 223, 323, 423, 523, and so on. When the ordering of the elements is related to the characteristic of interest, systematic sampling increases the representativeness of the sample. If the ordering of the elements produces a cyclical pattern, systematic sampling may decrease the representativeness of the sample. A Graphical Illustration of Systematic Sampling Stratified Sampling A two-step process in which the population is partitioned into subpopulations, or strata. The strata should be mutually exclusive and collectively exhaustive in that every population element should be assigned to one and only one stratum and no population elements should be omitted. Next, elements are selected from each stratum by a random procedure, usually SRS. A major objective of stratified sampling is to increase precision without increasing cost. Stratified Sampling The elements within a stratum should be as homogeneous as possible, but the elements in different strata should be as heterogeneous as possible. The stratification variables should also be closely related to the characteristic of interest. Finally, the variables should decrease the cost of the stratification process by being easy to measure and apply. Stratified Sampling In proportionate stratified sampling, the size of the sample drawn from each stratum is proportionate to the relative size of that stratum in the total population. In disproportionate stratified sampling, the size of the sample from each stratum is proportionate to the relative size of that stratum and to the standard deviation of the distribution of the characteristic of interest among all the elements in that stratum. Applications of Stratified Sampling Customer Satisfaction Surveys: A company like Revlon might use stratified sampling to survey customers across different age groups, such as 18-25, 26-35, and 36-50, to ensure that feedback is gathered from each segment, reflecting diverse preferences and needs. Employee Engagement Studies: A large corporation could employ stratified sampling by department (e.g., marketing, sales, production, and R&D) to measure employee engagement levels, ensuring that the results reflect the unique experiences and challenges of each department. Market Research: A business launching a new product might stratify the population by geographic region (e.g., urban, suburban, rural) to understand how regional differences affect consumer behavior and preferences. Product Testing: For testing a new cosmetic product, a company could use stratified sampling by skin type (e.g., oily, dry, combination) to ensure that the product’s effectiveness is evaluated across all relevant consumer groups. Sales Analysis: A retailer might stratify customers by income level to analyze purchasing patterns and tailor marketing strategies to different economic segments, ensuring that promotions are effective across all income groups. A Graphical Illustration of Stratified Sampling Cluster Sampling The target population is first divided into mutually exclusive and collectively exhaustive subpopulations, or clusters. Then a random sample of clusters is selected, based on a probability sampling technique such as SRS. For each selected cluster, either all the elements are included in the sample (one-stage) or a sample of elements is drawn probabilistically (two-stage). Cluster Sampling Elements within a cluster should be as heterogeneous as possible, but clusters themselves should be as homogeneous as possible. Ideally, each cluster should be a small- scale representation of the population. In probability proportionate to size sampling, the clusters are sampled with probability proportional to size. In the second stage, the probability of selecting a sampling unit in a selected cluster varies inversely with the size of the cluster. Types of Cluster Sampling A Graphical Illustration of Cluster Sampling Hypothesis Testing A null hypothesis (H0) is a statement of the status quo, one of no difference or no effect. If the null hypothesis is not rejected, no changes will be made. An alternative hypothesis (H1) is one in which some difference or effect is expected. Accepting the alternative hypothesis will lead to changes in opinions or actions. The null hypothesis refers to a specified value of the population parameter (e.g., μ, , π), not a sample statistic (e.g., X̅ ). Steps Involved in Hypothesis Testing Data Preparation Process Questionnaire Checking A questionnaire returned from the field may be unacceptable for several reasons. Parts of the questionnaire may be incomplete. The pattern of responses may indicate that the respondent did not understand or follow the instructions. The responses show little variance. One or more pages are missing. The questionnaire is received after the preestablished cutoff date. The questionnaire is answered by someone who does not qualify for participation. Editing Treatment of Unsatisfactory Results Returning to the Field – The questionnaires with unsatisfactory responses may be returned to the field, where the interviewers recontact the respondents. Assigning Missing Values – If returning the questionnaires to the field is not feasible, the editor may assign missing values to unsatisfactory responses. Discarding Unsatisfactory Respondents – In this approach, the respondents with unsatisfactory responses are simply discarded. Coding Coding means assigning a code, usually a number, to each possible response to each question. The code includes an indication of the column position (field) and data record it will occupy. Coding Questions Fixed field codes, which mean that the number of records for each respondent is the same and the same data appear in the same column(s) for all respondents, are highly desirable. If possible, standard codes should be used for missing data. Coding of structured questions is relatively simple, since the response options are predetermined. In questions that permit a large number of responses, each possible response option should be assigned a separate column. Coding Guidelines for Coding Unstructured Questions: Category codes should be mutually exclusive and collectively exhaustive. Only a few (10% or less) of the responses should fall into the “other” category. Category codes should be assigned for critical issues even if no one has mentioned them. Data should be coded to retain as much detail as possible. Coding Questionnaires The respondent code and the record number appear on each record in the data. The first record contains the additional codes: project code, interviewer code, date and time codes, and validation code. It is a good practice to insert blanks between parts. Restaurant Preference Id Preference Quality Quantity Value Service Income 1 2 2 3 1 3 6 2 6 5 6 5 7 2 3 4 4 3 4 5 3 4 1 2 1 1 2 5 5 7 6 6 5 4 1 6 5 4 4 5 4 3 7 2 2 3 2 3 5 8 3 3 4 2 3 4 9 7 6 7 6 5 2 10 2 3 2 2 2 5 11 2 3 2 1 3 6 12 6 6 6 6 7 2 13 4 4 3 3 4 3 14 1 1 3 1 2 4 15 7 7 5 5 4 2 16 5 5 4 5 5 3 17 2 3 1 2 3 4 18 4 4 3 3 3 3 19 7 5 5 7 5 5 20 3 2 2 3 3 3 Codebook Excerpt Example of Questionnaire Coding Data Transcription Data Cleaning Consistency Checks Consistency checks identify data that are out of range, logically inconsistent, or have extreme values. Computer packages like SPSS, SAS, EXCEL, and MINITAB can be programmed to identify out-of- range values for each variable and print out the respondent code, variable code, variable name, record number, column number, and out-of-range value. Extreme values should be closely examined. Data Cleaning Treatment of Missing Responses Substitute a Neutral Value – A neutral value, typically the mean response to the variable, is substituted for the missing responses. Substitute an Imputed Response – The respondents' pattern of responses to other questions are used to impute or calculate a suitable response to the missing questions. In casewise deletion, cases, or respondents, with any missing responses are discarded from the analysis. In pairwise deletion, instead of discarding all cases with any missing values, the researcher uses only the cases or respondents with complete responses for each calculation. Statistically Adjusting the Data Weighting In weighting, each case or respondent in the database is assigned a weight to reflect its importance relative to other cases or respondents. Weighting is most widely used to make the sample data more representative of a target population on specific characteristics. Yet another use of weighting is to adjust the sample so that greater importance is attached to respondents with certain characteristics. Statistically Adjusting the Data Use of Weighting for Representativeness Statistically Adjusting the Data – Variable Respecification Variable respecification involves the transformation of data to create new variables or modify existing variables. e.g., the researcher may create new variables that are composites of several other variables. Dummy variables are used for respecifying categorical variables. The general rule is that to respecify a categorical variable with K categories, K-1 dummy variables are needed. Statistically Adjusting the Data – Variable Respecification Product Usage Original Variable Blank Dummy Variable Blank Category Code Code Blank Blank X1 X2 X3 Nonusers 1 1 0 0 Light users 2 0 1 0 Medium users 3 0 0 1 Heavy users 4 0 0 0 Note that X1 = 1 for nonusers and 0 for all others. Likewise, X2 = 1 for light users and 0 for all others, and X3 = 1 for medium users and 0 for all others. In analyzing the data, X1, X2, and X3 are used to represent all user/nonuser groups. Thank You