Collect, Prepare, and Examine Data PDF
Document Details
Uploaded by Deleted User
Schindler
Tags
Summary
This document outlines the critical steps in data collection, including training data collectors, determining timelines, implementing processes, inviting participants, activating tasks, reminding participants, and entering data. It covers key elements of data entry, such as creating data files, coding schemes, and technological advancements. The document also emphasizes the importance of data preparation through post-collection coding and editing of data.
Full Transcript
BM2406 COLLECT, PREPARE, AND EXAMINE DATA Collect the Data (Schindler, 2022) The critical tasks involved in data collection are as follows: 1. Train the Data Collectors. Ensure data collectors are well-prepared, especially for complex instruments or when interviewe...
BM2406 COLLECT, PREPARE, AND EXAMINE DATA Collect the Data (Schindler, 2022) The critical tasks involved in data collection are as follows: 1. Train the Data Collectors. Ensure data collectors are well-prepared, especially for complex instruments or when interviewers need to follow specific procedures. Training includes understanding survey instructions, managing participant interactions to minimize bias, and correctly submitting completed instruments. 2. Determine the Data Collection Timeline. Establish specific timelines for training, task activation, data entry, editing, and finalizing the data file. The timeline should include training schedules, task activation times, start and end times for data entry, editing periods, and the date when the clean data file will be ready. 3. Implement Research Processes. Design and execute the procedures for distributing and collecting measurement instruments, whether automated or manual. Processes must ensure proper distribution and return of measurement instruments, which can be automated or manual. For example, a hotel satisfaction survey where a questionnaire is placed in the room and collected upon checkout. 4. Invite Participants. Craft and send invitations to selected participants, ensuring the invitation process builds rapport and secures cooperation. Methods can include in-person, phone, email, or mail, with prescreening questions and prepared scripts or letters. 5. Activate Research Tasks. Launch each research task when ready, ensuring all issues are addressed. Activating a task means the researcher has ensured it is prepared and as error-free as possible, launching the part of the process that distributes the questionnaire or other instruments. 6. Remind Participants. To improve response rates, send reminders to participants to complete tasks using methods like email, text, or phone calls. Reminders can be sent via email, text, or phone. Multiple reminders may be used based on the study's requirements. 7. Enter the Data. Code and create a data file, ensuring all data is standardized and ready for analysis. Data entry involves coding and creating a standardized data file for analysis. While computerized surveys automate this, other methods may require manual data entry to ensure consistency and readiness for analysis. Enter the Data (Schindler, 2022) Data entry transforms collected data into a format that can be viewed and manipulated for analysis. This process involves creating a data file using various software programs that simplify storage, retrieval, and updating. Key elements of the data entry process include: Creating Data Files. Data entry leads to creating a data file, with software allowing researchers to define data fields and link files for efficient management. Examples include statistics programs and spreadsheet applications like Excel, which provide a flexible means for data entry and viewing. Coding Scheme. Accurate data entry relies on a coding scheme (codebook) that specifies how responses are mapped to variables. Pretesting the instrument helps identify and correct coding issues before final data collection. The coding scheme includes variable IDs, names, labels, locations, response codes, and variable types. Keyboarding and Technological Advancements. Manual data entry through keyboarding is still standard but can be time-consuming and prone to errors. Technological advancements such as online surveys 07 Handout 1 *Property of STI Page 1 of 6 BM2406 (i.e., MS Forms), barcode or QR scanning, voice recognition, and electronic tablets have improved efficiency and accuracy. Data Entry Tools. Database programs, spreadsheets, and statistical analysis packages are essential tools for data entry, enhancing the process's speed and accuracy. Prepare the Data (Schindler, 2022) Seasoned researchers know that raw data is not immediately ready for analysis. Data preparation involves two (2) critical tasks: post-collection coding and data editing. These steps ensure data accuracy and convert raw data into a format suitable for analysis. Post-Collection Coding of Data Coding Open-Ended Responses. Open-ended questions must be coded (a method of analyzing open- ended responses by identifying commonalities and sorting them into categories [code], which are then used to turn qualitative data into quantitative data). Researchers often reassess predetermined categories after data collection to ensure they are still relevant. Content analysis is a systematic method for coding textual or verbal responses to identify patterns and draw inferences. Content Analysis. This method helps analyze various data types, including interviews, survey responses, and social media posts. Researchers define units of data (context, sampling, or recording) and categorize them into syntactical (specific, author-defined words, phrases, sentences, or paragraphs), referential (describe objects by using words and phrases), propositional (assertions about an object), or thematic units (topics contained within [and across] texts) for analysis. Reliability in Coding. Ensuring the reliability of intra-rater (consistent coding by the same person) and inter-rater (consistent coding by different people) is crucial. Computerized content analysis helps maintain reliability and validity. Editing Data Structured Questions. Responses to structured questions can be anticipated and pre-coded during instrument design. However, researchers may need to adjust coding schemes based on preliminary data analysis. For example, a seven (7)-point scale might be reduced to a three (3)-point scale if it better suits the data patterns. Edit the Data (Schindler, 2022) Editing ensures that data variables are correctly coded and all collected data is accurately entered. This process detects and corrects errors, guaranteeing high data quality. A descriptive statistical summary helps identify mistakes; thorough editing remains essential even with online surveys. The following are the primary purposes of data editing: Completeness Data must be ready for analysis. Incomplete data, such as abbreviations or shorthand used during interviews, need translation soon after collection. Missing entries should be followed up with callbacks rather than guesses to avoid bias. Accuracy Verifies adherence to research protocols and checks for fake or inaccurate data. Detects distinctive response patterns indicating potential falsification. For instance, similar handwriting or sequential completion times on self-administered surveys may indicate suspicious responses. Identifies participant entry errors, such as providing answers on the wrong scale or multiple answers to a single question, and corrects these based on consistency or other data. 07 Handout 1 *Property of STI Page 2 of 6 BM2406 Appropriate Coding Ensures that the preliminary coding scheme is suitable for analysis. Categories for structured questions must be mutually exclusive, exhaustive, and focused on a single dimension, facilitating comparison and cross-tabulation of variables. The following are the common issues encountered in data editing: "Don't Know" (DK) Responses Legitimate DK responses. If anticipated and included appropriately, DK responses are considered beneficial. For example, in a survey asking, "What year was the original iPhone released?" some respondents might not know the answer. If the survey consists of a DK option, it allows these respondents to indicate their uncertainty without guessing, thus maintaining the accuracy of the data. Problematic DK responses. Indicate issues with the measurement question, necessitating exclusion from analysis to avoid bias. For example, in a survey question such as, "Do you agree or disagree with the new policy changes implemented last month?" Many DK responses might suggest that respondents are unfamiliar with the policy changes. It indicates an issue with the question design or the respondents' awareness. Including these DK responses in the analysis could introduce bias, as they do not provide meaningful information about the respondents' opinions. Missing Data Data Missing Completely at Random (MCAR). Not dependent on the variable or other variables. During a survey, a respondent accidentally skips a question because they were distracted. The missing response is unrelated to the content of the question or the respondent's characteristics. This type of missing data does not introduce bias because the reason for the missingness is entirely random. Data Missing at Random (MAR). Dependent on other variables but not on the variable itself. In a health survey, respondents with higher education levels are likelier to skip a question about income. The missing income data is related to the education level but not the income itself. Statistical methods can account for this missing data by considering the relationship with other variables. Data Not Missing at Random (NMAR). It is dependent on the variable itself, such as when participants intentionally skip a question. In a survey on mental health, respondents with severe depression might skip questions about their mental state due to discomfort or stigma. The missing data is directly related to the severity of their depression. Ignoring this type of missing data can lead to biased results, as the reason for the missingness is related to the variable of interest. Correction Techniques. Include listwise deletion, pairwise deletion, and predictive replacement, each with potential biases depending on the type of missing data. If a dataset has 100 respondents and five (5) respondents have missing values for one or more variables, those five (5) respondents are entirely removed from the analysis. It can lead to a loss of valuable data, especially if the missing data is MAR or NMAR. Real-time tallying in computerized instruments can prompt participants for corrections when they enter out- of-range codes, select multiple responses, or skip variables. Overall, editing ensures data completeness, accuracy, and proper coding, facilitating reliable and valid research analysis. Examine the Data Exploratory Data Analysis (EDA) is used to simplify data presentation through descriptive statistics and initial graphical displays, known as data visualization (Schindler, 2022). Data visualizations are graphic representations of data that help people understand patterns and trends, identify relationships between variables, and spot outliers. This is true even for complex datasets. Visual 07 Handout 1 *Property of STI Page 3 of 6 BM2406 elements such as charts, graphs, and maps are used to present information in an accessible and understandable way (Hammond, 2023). The following are the different types of charts and graphs (Schindler, 2022): Frequency Tables, Bar Charts, and Pie Graphs. It is useful for nominal variables but less so for interval- ratio data with many values. Frequency tables array data by response codes, while bar charts and pie graphs offer more precise visual insights. Histograms. Ideal for interval-ratio data, grouping values into intervals to display distribution shape, skewness, kurtosis, and outliers. The number of intervals is often based on the square root of observations. Stem-and-Leaf Displays. Like histograms, they preserve actual data values, aiding direct inspection and rank order preservation. They provide immediate visual insights into data range, clusters, gaps, and outliers. Pareto Diagrams. Bar charts are sorted by decreasing importance, summing to 100%. They highlight key issues, as seen in a laptop repair complaint analysis showing the top two problems accounting for 80% of issues. Scatter Plot. It displays individual data points on a two-dimensional graph. Each point represents the values of two variables, with one variable plotted along the x-axis and the other along the y-axis, allowing for the identification of potential correlations or patterns between the variables. Scenario You are a data analyst at a retail store, tasked with analyzing customer satisfaction and spending patterns. You have collected data from a survey that includes customer satisfaction ratings, average monthly spending, age, and product categories purchased. It would help if you use various graphs and charts to visualize and interpret this data. Sample Data Customer Information: Customer ID Age Gender Average Monthly Spending Satisfaction Rating (1-5 scale) Product Categories Purchased (Electronics, Clothing, Groceries, etc.) Data Set: Avg Monthly Satisfaction Customer ID Age Gender Product Categories Purchased Spending Rating 1 25 M 200 4 Electronics, Clothing 2 34 F 150 3 Groceries, Clothing 3 45 F 300 5 Electronics, Groceries 4 29 M 180 2 Clothing 5 52 M 250 4 Electronics, Groceries, Clothing 6 37 F 220 3 Clothing 7 23 F 100 2 Groceries 8 40 M 400 5 Electronics 07 Handout 1 *Property of STI Page 4 of 6 BM2406 Avg Monthly Satisfaction Customer ID Age Gender Product Categories Purchased Spending Rating 9 31 F 120 3 Groceries, Electronics 10 48 M 310 4 Clothing, Groceries Frequency Table The frequency of each satisfaction rating. Satisfaction Rating Count 1 0 2 2 3 3 4 3 5 2 Bar Chart 6 The count of each product category purchased. 5.5 Product Category Count Electronics 5 5 Clothing 6 4.5 Groceries 6 4 Electronics Clothing Groceries Pie Chart The proportion of satisfaction ratings. Satisfaction Rating Percentage 1 0% 2 20% 3 30% 4 30% 1 2 3 4 5 5 20% Histogram The distribution of average monthly spending. Interval Count 100-149 2 150-199 2 200-249 2 250-299 2 300-349 1 350-399 0 400-449 1 07 Handout 1 *Property of STI Page 5 of 6 BM2406 Stem-and-Leaf Display The distribution of ages. Stem Leaf 5 2 4 0 5 8 3 1 4 7 2 3 5 9 Pareto Diagram The distribution of ages. Cumulative Product Category Count Percent Electronics 5 29% Clothing 6 65% Groceries 6 100% Scatter Plot 450 The relationship between age and average monthly 400 400 spending. Age Avg Monthly Spending 350 310 300 23 100 300 250 25 200 250 220 29 180 200 180 200 31 120 150 34 150 150 120 100 37 220 100 40 400 50 45 300 0 48 310 0 10 20 30 40 50 60 52 250 Note: Download 07 Data Visualization.xls from your eLMS to access the sample data, including the created graphs and charts. Data presentation of feasibility studies MUST NOT be limited to pie charts and frequency tables. Use some variations for better visualization and analysis of data. References: Hammond, T. (2023). 10 Types of Charts And Graphs for Data Visualization. https://www.thoughtspot.com/data- trends/data-visualization/types-of-charts-graphs Schindler, P. (2022). Business Research Methods (14th ed.). McGraw Hill. 07 Handout 1 *Property of STI Page 6 of 6