Data Types PDF
Document Details
Uploaded by Deleted User
Ivan Buljan
Tags
Summary
This presentation covers different types of data, including qualitative and quantitative variables. It explores various data types, such as nominal, ordinal, interval, and ratio, and discusses their characteristics and uses in data analysis. The presentation also touches upon data entry techniques, highlighting the importance of accurate record-keeping.
Full Transcript
Data types Ivan Buljan Assistant professor Content 0 1 Variables Independent and dependent variables Scales Examples Data entry Missing variables ...
Data types Ivan Buljan Assistant professor Content 0 1 Variables Independent and dependent variables Scales Examples Data entry Missing variables Variables/participants/data 0 2 Variables Height Weight Age (in Sex (cm) (kg) years) (Categ ory) Person 1 176 78 24 Female Participants Data Person 2 180 90 28 Male Variable 0 3 Property of a phenomenon to take on different values Qualitative Variables: Descriptive, non-numeric (e.g., gender, descriptive grades, profession) Quantitative Variables: Values or attributes can be expressed in numerical terms (e.g., questionnaire scores, height, weight, number of children, etc.). Why do you want to know your 0 variables? 4 Well, because the type of variable determines: A) type of test we can use which is related to the B) hypothesis we want to test C) conclusions we can draw from the data It is important to note that there are different levels of informativeness depending on the data type. Independent and dependent variables 0 Variable 5 Scale type Example characteristics Unordered Qualitative Nominal Sex, urbanization categories Qualitative Ordered Ordinal Grades, ranking categories Intelligence, Quanititative Interval No absolute zero psychological questionnaires Quanititative Ratio Has absolute zero Height, weight Nominal variables 0 6 Definition: Numbers serve as labels or names. Example: Gender – male = 0, female = 1. Characteristics: No inherent order or ranking. Application: Useful for categorizing data without numerical significance Ordinal variables 0 7 Definition: Numbers indicate order, but the differences between them are not known. Example: School grades – 1, 2, 3, 4, 5. Characteristics: Shows relative ranking but not the exact differences. Application: Useful for ranking and order without precise measurements. Interval variables 0 8 Definition: Numbers indicate order and precise differences, but no true zero point. Example: Temperature – 20ºC is not twice as warm as 10ºC. Characteristics: Equal intervals between values, but zero does not indicate 'none’. Application: Useful for measuring variables where ratios are not meaningful. Equidistance principle Ratio variables 0 9 Definition: Numbers indicate order, precise differences, and have a true zero point. Example: Length – 20 cm is twice as long as 10 cm. Characteristics: Allows for meaningful comparisons of ratios. Application: Useful for measuring variables where both differences and ratios are meaningful Type of data 1 0 Height Study program Anxiety test score Political affiliation Age in days Social status Type of data? 1 1 How many cigarettes you smoke daily? 1-5 6-10 11-15 16-20 20 and more Type of data? 1 2 Are you traditional? Yes/No What is your political affiliation? Right center/Moderately right/Strongly right/Extremely right I consider myself a conservative person Completely Agree Neither agree Disagree Strongly agree nor disagree disagree 1 2 3 4 5 Researchers considered the number of traditional items (pictures, magazines, religious symbols, etc.) visible in the participant’s room as a measure of traditionalism. Data entry 1 3 1. The first entry is always done in Excel (RAW DATA) 2. In general, each column represents each variable you are testing (the rule of thumb is if you are entering the survey data, enter each variable in the same sequence as it is presented in the survey), and give full names that you may recognize 3. If someone skipped a question/did not answer the question, etc. Leave the cell blank 4. Qualitative data are entered as texts 5. Always make a code book 6. After you have entered the data, save the first version and do not do anything with that raw data 7. Save it on an external disc, cloud drive, USB, three computers and email Data entry 1 Participant s Gender Age Depression Score Number of pills Marital status Any comment 4 per s: week 1 Male 28 14 0 Divorced / 2 Female 45 19 0 Married Interesting survey 3 Female 25 8 2 Married A bit boring and too long 4 Male 35 21 3 In a TY relationshi p 5 Female 30 16 3 Single Repeated measurements 1 5 Remember, columns represent variables, and rows represent participants When a participant goes through one, two, or three measurements, each measurement is a new variable and has to have its column If you have multiple measurements, give them clear names so that you do not mix them, and keep them in logical order Repeated measurements 1 6 Participant Measurement 1 Measurement 2 Measurement 3 Measurement 4 1 23 43 27 31 2 44 23 43 13 3 32 23 64 29 4 45 21 54 12 5 18 37 34 16 How to enter “Select all that applies?” 1 7 Which of the following ice cream flavors you like? A) Chocolate B) Vanilla C) Hazelnut D) Tutti frutti E) Cherry F) Apple How to enter “Select all that applies?” 1 8 Which of Chocolate Vanilla Hazelnut Tutti Cherry Apple the frutti following ice cream flavours you like? 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 1 1 1 0 0 0 1 Missing Data 2 0 The previous slide is a fairy tale, and nice example of something that does not happen often in real world In the real world, some participants are lost due to follow up, due to various reasons The problem is that in that case you are limited about your conclusions because sample size is decreased E.g. on the previous slide the N is 5 in all three measurements Missing Data 2 1 Participant Measurement 1 Measurement 2 Measurement 3 Measurement 4 1 23 27 31 2 44 23 43 13 3 32 23 29 4 45 21 54 12 5 18 37 34 Missing Data 2 2 What is N on the previous slide from which we may conclude the differences between the three measurements? N=2 Hypothetical situation: you need to collect the 200 participants in a survey, and you collect just 200. However, 35 of them have not provided you with only a few answers in your questionnaire, and you have a problem calculating the sum of the answers for those participants because now you are stuck with only 175 participants. Reversely coded variables 1 9 Some programs have the recode function (SPSS, R…) However, it is sometimes simpler just to do it in Excel E.g. if you have an item ranging from 1-5, and you want that variable to be reversely coded (1 becomes 5, 2 becomes 4, etc); use the following algorithm: Use the highest number in your range (in this case is 5), and add 1 (so it is 6, logically). Then subtract the score in the cell from 6, and you get recoded item E.g. someone has a result 2 on reversely coded variable and you want to recode it: 6-2 is 4… Thank you for your attention! Questions?