Data Collection Lecture Notes PDF

Summary

This document is a lecture presentation on data collection, focusing on the CRISP-DM methodology and different data collection methods (primary and secondary). It covers data preparation, business understanding, and evaluation processes related to data collection.

Full Transcript

Data Collection Dr. Amira Abdelatey Agenda Data life cycle Data collection Why data collection? Data collection methods What before data collection? Data Life Cycle: Traditional Data Mining Life Cycle: CRISP-DM Methodology: Stands for Cross Industry Standard Proc...

Data Collection Dr. Amira Abdelatey Agenda Data life cycle Data collection Why data collection? Data collection methods What before data collection? Data Life Cycle: Traditional Data Mining Life Cycle: CRISP-DM Methodology: Stands for Cross Industry Standard Process for Data Mining is a cycle that describes commonly used approaches that data mining experts use to tackle problems in traditional BI data mining. 3 CRISP-DM Methodology (1) Business Understanding − understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition. Data Understanding − initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data. Data Preparation − covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools. 4 CRISP-DM Methodology (2) Modeling − various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, it is often required to step back to the data preparation phase. Evaluation − Before proceeding to final deployment of the model, it is important to evaluate the model thoroughly and review the steps executed to construct the model, to be certain it properly achieves the business objectives. Deployment − Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that is useful to the customer. 5 Data Collection Data collection is a systematic process of gathering observations or measurements. Data collection allows you to gain first-hand knowledge and original insights into your problem. Data collection methods are used in businesses and sales organizations to analyze the outcome of a problem, arrive at a solution, draw conclusions about the performance of a business and so on. Data collection is either qualitative or quantitative. Why data collection? Data collection is done to analyze a problem and learn about its outcome and future trends. When there is a need to arrive at a solution for a question, data collection methods helps to make assumption about the result in the future. Data collection methods It is very important that we collect reliable data from the correct sources to make the calculations and analysis easier. Two types of data collection methods: Primary Data Collection Methods Secondary Data Collection Methods Primary data collection is more time consuming than secondary data collection. Primary data collection Primary data collection is the original form of data that is collected directly from the source. For example, data collected through surveys, opinion polls from people, conducting experiments, Primary data can be classified in to the following two types: Quantitative data collection methods Qualitative data collection methods Primary data collection Qualitative data collection methods: does not include any mathematical calculation to collect data. It is mainly used for analyzing the quality, or understanding the reason behind something. Quantity data collection methods: The term 'Quantity' refers to a certain number. Quantitative data collection methods express the data in figures or numbers using either traditional methods or online data collection methods. Secondary data collection The data collected by an another person is called secondary data which have through some statistical analysis. Data that is to be known is readily available and does not require any special methods of data collection. Example: data from sensors, magazines, documents. The main advantage of this type of data collection method is that it is easy to collect since they are readily available. What before data collection? 1. The aim of the problem 2. The type of data that you will collect 3. The methods and procedures you will use to collect, store, and process the data Step 1: Define the aim of the problem 1) you need to identify exactly what you want to achieve. You can start by writing a problem statement. 2) Depending on the problem, you need to collect quantitative or qualitative data 3) you can use a mixed methods approach that collects both types of data. Step 2: Choose your data collection method Experimental research is primarily a quantitative method. Interviews, focus groups, are qualitative methods. Surveys, observations, archival research and secondary data collection can be quantitative or qualitative methods. Step 3: Plan your data collection procedures What procedures will you follow to make accurate observations or measurements of the variables you are interested in? if you’re conducting surveys or interviews, decide what form the questions will take; if you’re conducting an experiment, make decisions about your experimental design Step 4: Collect the data ensure that high quality data is recorded Record all relevant information as and when you obtain data. Double-check manual data entry for errors. If you collect quantitative data, you can assess the reliability and validity to get an indication of your data quality.

Use Quizgecko on...
Browser
Browser