Module 1: Managing Databases
Document Details

Uploaded by HolyMercury3977
Emlyon Business School
Tags
Summary
This document introduces module 1 on managing databases in data and content analysis. Key topics include survey design, data types, and variable analysis. The presentation covers the preparation and analysis of data.
Full Transcript
MODULE 1: MANAGING THE DATA BASE DATA & CONTENT ANALYSIS MODULE 1: MANAGING THE DATA BASE 1 TODAY’S AGENDA Let us start by asking ourselves the fundamental question of « why this course »? Let us try and find some answers by watching a short video What are the steps in data analysi...
MODULE 1: MANAGING THE DATA BASE DATA & CONTENT ANALYSIS MODULE 1: MANAGING THE DATA BASE 1 TODAY’S AGENDA Let us start by asking ourselves the fundamental question of « why this course »? Let us try and find some answers by watching a short video What are the steps in data analysis? The different types of variables Preparing the data The Sphinx Campus environment Group work MODULE 1: MANAGING THE DATA BASE 2 MARKET RESEARCH PROCESS 1) Problem definition 2) Research design 3) Data collection 4) Preparing & Analysing data 5) Reporting MODULE 1: MANAGING THE DATA BASE 3 Check for accuracy, completeness and Check and prepare the consistency data base Treatment of missing STEPS IN DATA responses ANALYSIS Allows to describe a data base Perform univariate Gives a first idea of the data analyses results Create tables and charts Allows to understand links Perform bivariate (or between two (bivariate) or multivariate) analyses more (multivariate) variables Write the report (it is not the objective of the course, but there will be some recommendations during the last class) MODULE 1: MANAGING THE DATA BASE 4 SOME TERMINOLOGY MODULE 1: MANAGING THE DATA BASE 5 SURVEYS A survey is a quantitative research methods that consists of standardized questions administered to a large sample of participants Surveys use questionnaires, which consists of set of questions that could be asked as: Open-ended questions: the respondent is free to answer with his/her own words. Closed-ended questions: the choice of responses is determined Dichotomous questions: yes or no Multiple choice Likert-scales/semantic differential MODULE 1: MANAGING THE DATA BASE 6 REVIEW – QUESTION TYPOLOGY IN A QUESTIONNAIRE Open-ended question: - Box to be filled by the responded with his own words - For collecting spontaneous answers !!!! PLEASE NOTE !!!! Open-ended questions must be avoided as much as possible in large surveys because the analysis of the obtained data is complex and long [you will be discovering it in lecture 3]. If your survey requires multiple open-ended questions because you are unable to develop close-ended questions, you should ask yourself the suitability of the survey MODULEmethod for 1: MANAGING THE your DATA BASE research aims. 7 REVIEW – QUESTION TYPOLOGY IN A QUESTIONNAIRE Close-ended question: - Predetermined options Dichotomous question - one answer between two options Only one answer possible Multiple answers allowed MODULE 1: MANAGING THE DATA BASE 8 REVIEW – QUESTION TYPOLOGY IN A QUESTIONNAIRE Likert scale: closed-ended question: - Very often used in questionnaires - Easy for participants MODULE 1: MANAGING THE DATA BASE 9 SURVEYS The objective of the survey is to receive responses on all questions from each survey respondent* Each question corresponds to one variable in the data base The different responses obtained for each variable are named outcomes *: a survey respondent is the generic term to indicate who is answering the questionnaire. It can be a consumer or an employee in a company answering for the company, etc. MODULE 1: MANAGING THE DATA BASE 10 SURVEYS The questionnaire will be administered to a sample of the population After each individual answers the questionnaire, a string of outcomes is created (yes, blue, panzani, 27, woman, …) This string of responses constitutes one observation All observations constitute the database MODULE 1: MANAGING THE DATA BASE 11 THE DATA BASE The set of responses gathered for all the people interviewed constitutes the database The data base is a matrix where rows represent the observations and columns represent the variables One row corresponds to the answers of one person One column corresponds to the answers of all the people in the sample for one specific question The matrix consists of k * j cells, where k=number of columns, i.e. the number of variables j= number of rows, i.e. the size of the sample MODULE 1: MANAGING THE DATA BASE 12 One variable One observation MODULE 1: MANAGING THE DATA BASE 13 VARIABLE By definition, a variable is a: Characteristic of an event or individual In questionnaires, a variable = one question of the questionnaire Therefore, each variable associates a value (numeric or not) to each individual in the sample For instance, to each individual interviewed corresponds a value for age The various values taken by the variable age are the outcomes MODULE 1: MANAGING THE DATA BASE 14 Read the table carefully and answer the questions EXAMPLE Place of Average days in residence Lyon Europe 4 Americas 2 Asia, Oceania 2 Africa 7 What is the population under study? What questions were asked to the respondents? What are the variable? How many outcomes does the variable have? MODULE 1: MANAGING THE DATA BASE 15 NATURE OF VARIABLES The nature of the variables influences the type of analysis Categorical/qualitative Numeric variable variable A variable is qualitative if its outcomes A variable is numeric if its outcomes are not numbers are numbers The outcomes can be words, symbols, codes, etc. This type of variable corresponds to What is the brand of your car? questions such as: What is your gender? What is your age? What is your place of residence? How much did you pay for your car? It is the outcome of an open-ended How many children do you have? question or dichotomous/multiple-choice How many rooms in your apartment? question For each of these questions, outcomes are Beware! a postal code, a measurement scale, etc. can be represented by numbers, but that represented by numbers does not mean they are numeric variables 1 MODULE 1: MANAGING THE DATA BASE 6 CATEGORICAL/ QUALITATIVE VARIABLES There are two main categories of qualitative variables (whose outcomes are not numbers) Nominal variables Ordinal variables A variable is nominal when the purpose of the question is to categorize the responses Alternatively, a variable is said to be The outcomes of a categorical variable ordinal if it is possible to classify do not have a specific order the outcomes in a specific, logical They are only categories: order In which age group are you? What is your gender? -18 year; 18-24 year; 25-34 year, … Male/Female Please indicate you higher level of What is the brand of your car? education Audi, BMW, Citroën, … Renault, etc. Primary school, high school, bachelor What was the main reason you bought it? degree… Comfort, design, price, fuel consumption, … MODULE 1: MANAGING THE DATA BASE 17 NUMERIC VARIABLES There are two main categories of numeric variables Continuous numeric Discrete variable variable A numeric variable is said to be interval (or discrete) if the value of A numeric variable is continuous if the its outcomes is a result of counting values of its outcomes originates from a A discrete numeric variable is measure; it can take on any value between - always expressed with an integer ∞ and +∞ number A continuous numeric variable can be Number of children, number of cars, expressed in decimals number of television in the household The price of gasoline, the height of an are all examples of discrete numerical individual, the salary or age are all examples of variables … continuous variables To find out whether a numeric It does not mean that they will always be variable is continuous or discrete, expressed as decimals ask yourself if the value can be expressed as a decimal (can a The rules on statistical analysis are the same for both family have 2.3 children?) types of numeric variables MODULE 1: MANAGING THE DATA BASE 18 THE CURIOUS CASE OF LIKERT SCALE QUESTIONS What kind of variable is it? Numeric or categorical? MODULE 1: MANAGING THE DATA BASE 19 THE CURIOUS CASE OF LIKERT SCALE QUESTIONS Despite being by nature qualitative variables, in social science Likert scales are treated as numeric discrete variables In this course, you will be using Likert scales as numeric discrete variables MODULE 1: MANAGING THE DATA BASE 20 FROM NUMERIC TO CATEGORICAL Numerical variables can be always transformed in categorical variables by creating subgroups of values. For example Place of Average days in This numerical variable ranges from 0 to ∞. residence Lyon You can transform it by creating groups of Europe 4 values for instance: Americas 2 Few days: from 0 to 4 days Some days: from 5 to 7 days Asia, 2 Several days: more than 7 days Oceania Africa 7 Whenever possible, prefer asking questions with numeric variables and in case transform it into categorical variables. Numeric variables allows you to perform more statistical analyses! MODULE 1: VÉRIFIER LA BASE DE DONNÉES 21 HOW TO IDENTIFY THE NATURE OF THE VARIABLE ON SPHINX Dans Sphinx Campus, la liste des variables est précédée In Sphinx d’unethe Campus, icône indiquant list sa nature of variables is (Moduleby preceded « Analyse a small»)symbol indicating the nature of the variable (Module « Analysis ») MODULE 1: MANAGING THE DATA BASE 22 WHAT THE SYMBOLS MEAN Categorical/qualitative, unique response Categorical, multiple response Likert scale Numeric, open Postal Code (question library) – qualitative variable 23 QUESTIONS? MODULE 1: MANAGING THE DATA BASE 24 FEW REMARKS ON ETHICS When analyzing data, you must be ethical towards the research community. This means: No cheating, fabrication, or alteration of data Analytical rigor: Attention to the analysis performed Completeness: The need to report unsatisfactory results as well MODULE 1: MANAGING THE DATA BASE 25 PREPARING THE DATABASE MODULE 1: VÉRIFIER LA BASE DE DONNÉES 26 DATA PREPARATION AND ANALYSIS STRATEGY Once you finish the data collection, you have to perform the following steps before analysing the data Questionnaire checking Checking for database completedness & data quality Coding & Transcribing Data Cleaning Variable respecification and recoding You must make yourself Selecting data analysis familiar with the strategy questionnaire to define the right analyses to be 27 QUESTIONNAIRE CHECKING: COMPLETEDNESS It consists of screening the observations that are incomplete or illegible (in case that you collected data by paper & pencil). When an observation is incomplete, it is unsatisfactory. What to do with unsatisfactory responses/observation? Return to the field to fill out incomplete answers (if the sample is small) Define how to treat missing data MODULE 1: MANAGING THE DATA BASE 28 MISSING DATA TREATMENT Various methods exist to replace missing values: Substitute a neutral value, such as the mean or mode Case wise deletion: observations with missing data will be discarded for the analysis of the variable with the missing data. Pairwise deletion: for each analysis, only complete observations are considered. Sphinx Campus uses case wise deletion to treat missing data for multiple analyses, you need to indicate to Sphinx that you do not want to consider missing data [see Lecture 2] MODULE 1: MANAGING THE DATA BASE 29 THE SPHINX CAMPUS ENVIRONMENT MODULE 1: VÉRIFIER LA BASE DE DONNÉES 30 SPHINX CAMPUS HOMEPAGE Module for Module for Database distributing a developing a survey survey Module for analyzing data SURVEYS BY DEFAULT AVAILABLE 31 MANAGING DATABASES ON SPHINX CAMPUS Module “Data” in Sphinx Campus General functions Analyzing data using a filter (strata) Checking the quality of the sample Non-response treatment Beware: there may be slight discrepancies between the PPT slides for Sphinx and your screen Be sure to change the language of Sphinx into English! MODULE 1: MANAGING THE DATA BASE 32 To illustrate the various functions of this module, open the survey “Automobiles” which is part of the default list of the surveys MODULE 1: MANAGING THE DATA BASE 33 HOMEPAGE OF THE “DATA” MODULE MODULE 1: MANAGING THE DATA BASE 34 GENERAL FUNCTIONS The spreadsheet shows the responses to all the questions for all respondents In this case, the number of observations is 200 and there are 4 pages of observations (50 observations per page) The purple tab is the active one (spreadsheet in this case) To the right of the screen you can see: the name of the survey, the number of answers, the nature of the sample (total or strata) The spreadsheet is locked by default, as indicated by the small lock icon MODULE 1: MANAGING THE DATA BASE 35 To view the entire screen, click on the UNLOCKING THE DATA small diagonal arrow on the right. MODULE 1: MANAGING THE DATA BASE 36 DELETING OBSERVATIONS Select the observations to delete. Click on the “Delete” button. MODULE 1: MANAGING THE DATA BASE 37 DELETING OBSERVATIONS THIS FUNCTION CAN BE USEFUL FOR THE ANALYSIS YOU HAVE TO DO FOR YOUR FIELD WORK. BUT DO NOT DELETE OBSERVATIONS FROM SURVEYS PRESENT ON SPHINX MODULE 1: MANAGING THE DATA BASE 38 ADDING OBSERVATIONS After clicking the “+Add” button, a new line appears at the bottom of the page. You can then fill out the responses, question by question. You may need to add observations to a database if you ask some participants to fill out a questionnaire with paper MODULE 1: MANAGING THE DATA BASE and pencil 39 OTHER POSSIBILITIES Click on the pencil next to the new question. You will be able to enter responses using the questionnaire. This is another way to collect data. It is useful if you must enter the responses from face-to-face interviews, for instance. MODULE 1: MANAGING THE DATA BASE 40 OTHER FUNCTIONS Delete all records: used if you want to erase all responses. This may be useful after testing a questionnaire or if the questionnaire has been modified. Variables: use this button if you want to hide some variables from the spreadsheet view, to make it easier to read. The variable is not deleted. Export: you can transfer the data into an Excel or CSV sheet if you want to use another statistical package for analyzing the data. The opposite is not possible, i.e. if you have an Excel file, you cannot upload it onto Sphinx. MODULE 1: MANAGING THE DATA BASE 41 EXPORTING DATA MODULE 1: MANAGING THE DATA BASE 42 CHOOSING A FILTER When choosing a filter, you define a sub-sample on which you can perform specific analyses For instance, you may want to look at responses from women only In the menu, select “Define filter” Select the variable to use as filter (31. Gender) and choose the response: “Woman” MODULE 1: MANAGING THE DATA BASE 43 DEFINE A FILTER Choose variable 31. Gender is among Woman MODULE 1: MANAGING THE DATA BASE 44 SUBGROUP FOR WOMEN There are 102 women in the sample. If you want to cancel the filter, just click on the X MODULE 1: MANAGING THE DATA BASE 45 CHECKING DATA Consistency check Filling rate/completion rate (% of respondents who answered at least one question) Non-response treatment Click on the button: “Sample Quality” MODULE 1: MANAGING THE DATA BASE 46 SAMPLE QUALITY This function gives a certain number of measures related to the quality of responses, the number of questionnaires completed, the number of variables completed, etc. There is also a general diagnostic on the quality of the data base MODULE 1: MANAGING THE DATA BASE 47 QUALITY OF DATASET “AUTOMOBILES” The number of variables might appear higher than the actual number of survey questions because Sphinx Campus autonomously records additional data, such as the respondent's location. To determine the number of questions in a questionnaire, check the questionnaire in the "Design" module. MODULE 1: MANAGING THE DATA BASE 48 SAMPLE QUALITY MODULE 1: MANAGING THE DATA BASE 49 SAMPLE QUALITY (BOTTOM OF THE PAGE) Cut-off point: 80% of completion Poorly documented: less than 80% of completion rate Properly documented: more than 80% of completion rate Fully documented: 100% completion (no missing value) MODULE 1: MANAGING THE DATA BASE 50 SAMPLE QUALITY (BOTTOM OF THE PAGE) It is problematic when questions are poorly documented – as it happens here for the ‘numerical variables’. In some cases it depends on the nature of the question; in other cases, you should ask yourself if the question was correclty formulated. MODULE 1: MANAGING THE DATA BASE 51