Survey Data Coding Techniques

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which test would be appropriate for examining the association between two nominal variables?

Spearman correlation
Pearson correlation
ANOVA
Crosstabs with chi-square test (correct)

What should be examined before analyzing correlations between continuous data?

Descriptive statistics of the dataset
Box plots for outliers
Scatterplots of the variables (correct)
Histograms of the data

Which of the following is NOT a requirement for performing a Pearson correlation?

Linear relationship
Continuous data
Normality of residuals (correct)
Observations must be independent

What does the homogeneity test assess in the context of ANOVA?

Equal variances among groups (A)

Signup and view all the answers

What is a critical step to take after using the 'Split file' function in data analysis?

Run the analysis without groups (D)

Signup and view all the answers

What is the purpose of dividing a ranking question into smaller problems?

To translate a single question into several distinct variables (C)

Signup and view all the answers

Which of the following represents user-missing values in survey data?

Values defined as 99 = refused to answer (A), Values defined as 97 = does not apply (C)

Signup and view all the answers

In conducting a survey, how should missing values be treated?

They can be left empty or defined explicitly (D)

Signup and view all the answers

What is a characteristic of straightforward coding?

It simplifies the coding process for quantitative data (B)

Signup and view all the answers

Which course could be considered the easiest based on a typical coding framework?

Health psychology (D)

Signup and view all the answers

What numerical code is assigned to female respondents in the questionnaire?

2 (D)

Signup and view all the answers

Which method provides information about categorical variables?

Frequency distributions (A)

Signup and view all the answers

What score represents the distance in multivariate space for respondents?

Mahalanobis distance (D)

Signup and view all the answers

What could cause an outlier such as a weight of 888 kg in a dataset?

Incorrect data entry (D)

Signup and view all the answers

What should be examined to identify potential outliers in the dataset?

Score patterns of suspicious individuals (B)

Signup and view all the answers

Which regression model specification helps in saving the Mahalanobis distance?

All variables as predictors (D)

Signup and view all the answers

What type of outlier is identified by examining one variable at a time?

Univariate outlier (B)

Signup and view all the answers

What is primarily used for analyzing continuous variables?

Descriptive statistics (D)

Signup and view all the answers

What is a potential consequence of removing outliers from a sample?

The results may significantly differ if outliers are excluded. (B), The statistical analysis can be affected by the randomization of the sample. (D)

Signup and view all the answers

Which measure of central tendency is least affected by skewness in data?

Median (B)

Signup and view all the answers

When conducting an Independent Samples T-Test in SPSS, what is the primary dependent variable being analyzed?

Age (D)

Signup and view all the answers

What would you use as a summary measure for nominal data?

Mode (B)

Signup and view all the answers

What is one of the advantages of reporting analyses both with and without outliers?

It allows for a comprehensive understanding of data impact. (C)

Signup and view all the answers

What should you be cautious of when analyzing data for central tendency?

Assuming a normal distribution in all cases. (A), Relying solely on the mode for continuous data. (B), Using the mean for categorical data. (D)

Signup and view all the answers

In the context of analyzing more than two independent groups, which model is appropriate for a continuous dependent variable?

General Linear Model (B)

Signup and view all the answers

Which of the following is NOT a recommended action when dealing with outliers?

Delete extreme cases to ensure randomness. (D)

Signup and view all the answers

Flashcards

Coding in Surveys

Assigning numerical values to responses in a survey, allowing for analysis. This can involve simple categories or more complex scales.

Forced Choice Coding

The process of simplifying a ranking question by presenting pairs of choices, making it easier for respondents to select their preference.

Missing Values in Surveys

Data points where information is unavailable, either due to missing input or the respondent's deliberate choice not to provide it.

Explicit Missing Value Codes

Specifying a specific code to represent a particular reason for a missing value, such as 'not applicable' or 'don't know'.

Signup and view all the flashcards

String Values in Surveys

Using text-based values for variables, such as 'male' or 'female', which can be analyzed alongside numerical data.

Signup and view all the flashcards

Outliers

Values that are significantly different from the majority of data points in a dataset. They can be unusually high or low.

Signup and view all the flashcards

Mode

A statistical measure that summarizes the most frequent value in a dataset.

Signup and view all the flashcards

Boxplot

A graphical representation of data that displays the distribution of a variable. Boxplots show the median, quartiles, and potential outliers.

Signup and view all the flashcards

Independent Samples T-Test

A statistical test used to compare the means of two independent groups. It is used to determine if there is a significant difference between the groups.

Signup and view all the flashcards

General Linear Model (GLM) - Univariate

A statistical technique used to analyze data when there are more than two groups. It's often used for comparing means across different groups.

Signup and view all the flashcards

Analyze with and without outliers

Data points that are not analyzed or removed from the dataset. It can be used to compare or analyze data with and w/o outliers.

Signup and view all the flashcards

Median

A measure of central tendency that represents the middle value in a sorted dataset. It's not influenced by extreme values.

Signup and view all the flashcards

Mean

A statistical measure that represents the average of a dataset. It's sensitive to extreme values.

Signup and view all the flashcards

Mahalonobis Distance

A score for each respondent that represents their distance in a multivariate space from the average respondent.

Signup and view all the flashcards

Bivariate Outlier

A type of outlier that occurs when a data point deviates significantly from the pattern of the other points in a scatterplot.

Signup and view all the flashcards

Multivariate Outlier

A type of outlier that occurs when a data point deviates significantly from the pattern of the other points in a multidimensional space.

Signup and view all the flashcards

Multivariate Screening

A technique to identify outliers by calculating the Mahalanobis distance for each respondent, allowing researchers to examine the distribution of these distances to pinpoint potential outliers.

Signup and view all the flashcards

Bivariate Screening

A technique used to analyze and identify anomalous combinations of values across multiple variables.

Signup and view all the flashcards

Screening per Variable

The process of examining individual variables to identify any potential issues or anomalies in the data.

Signup and view all the flashcards

Data Screening

The initial step in data analysis, aimed at ensuring the quality and consistency of collected data. It involves checking for errors, inconsistencies, and missing values.

Signup and view all the flashcards

Homogeneity Test

A statistical test used to check if the variances within different groups are equal. This is important for ANOVA, as it assumes equal variances across groups for accurate results.

Signup and view all the flashcards

Chi-Square (χ2) Test

A statistical test used to determine if there is a significant difference between observed frequencies and expected frequencies in a data set. It's often used for association between categorical variables.

Signup and view all the flashcards

Goodness-of-Fit Test

A statistical test used to determine if an observed frequency distribution is significantly different from a theoretical distribution. This helps determine how well the data aligns with the expected pattern.

Signup and view all the flashcards

Association Between Variables

A type of analysis that checks if there's a relationship between two variables. It's helpful for identifying if two factors are related to one another.

Signup and view all the flashcards

ANOVA (Analysis of Variance)

A statistical test used to determine if there is a statistically significant relationship between two or more variables. If the p-value is less than the significance level, the null hypothesis is rejected, and the alternative hypothesis is supported.

Signup and view all the flashcards

Study Notes

Coding and Screening for Surveys

Straightforward coding involves assigning clear numerical values for variables like age, gender, and opinions.
Age is coded as "age in years" in SPSS.
Gender is coded as 1 = male, 2 = female in SPSS.
Opinions are coded as 1=strongly disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree in SPSS.

Not So Straightforward Coding

Coding more complex questions, like ranking courses by difficulty, requires careful design.
Forced choice format breaks rankings into smaller, pairwise comparisons.
This allows translating one ranking question into multiple distinct variables.
Multiple answers are allowed when asking which courses participants liked most from a list.

Some Remarks about Missing Values

Missing values in surveys can be coded as empty or explicitly defined (system/user missing).
These codes can differentiate between different types of missing data (e.g., "not applicable").
Examples of missing value codes include 97 = does not apply, 98 = don't know, and 99 = refused to answer.
Using numerical codes (e.g., 1 = male, 2 = female) and value labels is recommended for variables like gender.
Assign a unique identifier to each participant for tracking and data analysis.

Screening Example Data

Example values for the variables sex, age, anxiety, IQ, married, and income are given in a table.
Age is in years.
Anxiety is on a 1-7 ordinal scale.
Married is the number of years married.
Income is in 5 categories (1≤ 1500, 2 = 1501-2500,..., 5>5000)
Other variables, with ranges and descriptive statistics like means and standard deviations, are shown.

Screening per Variable

Descriptive statistics for several variables are presented in a table, including number of brothers/sisters, number of children, age, education years completed (self, father, mother, spouse), R’s occupation prestige score, and occupational category.
Data summaries are provided for continuous and categorical variables, including frequency distributions and cross-tables.
Additional categorical data for respondents' sex and most important problems in the last 12 months (e.g. Finance, Health, Lack of Basic services) is included.

Bivariate Screening

Bivariate screening checks for unexpected combinations of values in pairs of variables.
A scatterplot is useful for visualizing relationships between continuous variables and identifying potential outliers.

Multivariate Screening

Multivariate screening examines multiple variables together to identify outliers.
Mahalanobis distance calculates the distance between a respondent and the average respondent in a multi-dimensional space.
High Mahalanobis distances indicate potential outliers.

Potential Outliers

Examine extreme values in terms of their means to determine if they are significantly different from the other values.
Look for irregular combinations of values on variables, as this suggests potential outliers
Scrutinize for data entry errors: A weight of 888 kg is suspicious and should be checked.
Assess whether respondents are outside of the expected population
Check if the sample consists of multiple, distinct subgroups.

Handling Outliers

Strategies for handling outliers include minimizing their influence, transforming their values closer to the mean, or deleting outliers if other factors permit
Carefully consider whether removing outliers maintains the sample's randomness.
Always report both the analysis with and without outliers and the rationale behind decisions to help maintain transparency.

Analyses

Central tendency measures (mean, median, mode) summarize data distributions.
Histograms, boxplots, and various SPSS analysis options are used to understand and visualize the distributions of variables like age and sex.
Appropriate statistical tests (e.g., t-tests) have to be selected for investigating differences or comparing characteristics across groups or conditions.

More Than Two Independent Groups

Analysis can be conducted on more than 2 groups using General Linear Model (GLM) in SPSS, checking assumptions.
Assumptions include normality of residuals in each subgroup, absence of significant outliers, and equal variances in dependent subgroups.

Histograms and Split Files

Splitting a file in SPSS lets users analyze and plot data individually for specific subgroups or groups based on categorical variables.

Output ANOVA through GLM, Associations between Variables

Output from the General Linear Model (GLM) procedure, including ANOVA results and F-tests for analyses with more than two groups, is shown for different categories like respondents' education levels.
To analyze the relationship between two variables, appropriate methods such as Spearman Correlation for ordinal data, or Pearson correlation and chi-squared test for continuous and nominal data respectively, are used. To prevent misinterpretation, the correlation analysis always begins with the scatterplot examination to check for patterns, outliers, and linearity.
Significance values from correlation tests represent the probability of obtaining the observed result if there is no true relationship between the variables.

x2 Tests

Chi-squared (χ²) tests indicate whether observed frequencies in a categorical variable differ from expected frequencies.
These tests can also verify for independence of paired observations across categories in two different variables.

Crosstabs in SPSS

SPSS procedure for creating cross-tabulation tables, showing frequencies and percentages within different groups of categorical variables, with example variable options for use.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Survey Data Coding Techniques

Choose a study mode

Podcast

Questions and Answers

Which test would be appropriate for examining the association between two nominal variables?

What should be examined before analyzing correlations between continuous data?

Which of the following is NOT a requirement for performing a Pearson correlation?

What does the homogeneity test assess in the context of ANOVA?

What is a critical step to take after using the 'Split file' function in data analysis?

What is the purpose of dividing a ranking question into smaller problems?

Which of the following represents user-missing values in survey data?

In conducting a survey, how should missing values be treated?

What is a characteristic of straightforward coding?

Which course could be considered the easiest based on a typical coding framework?

What numerical code is assigned to female respondents in the questionnaire?

Which method provides information about categorical variables?

What score represents the distance in multivariate space for respondents?

What could cause an outlier such as a weight of 888 kg in a dataset?

What should be examined to identify potential outliers in the dataset?

Which regression model specification helps in saving the Mahalanobis distance?

What type of outlier is identified by examining one variable at a time?

What is primarily used for analyzing continuous variables?

What is a potential consequence of removing outliers from a sample?

Which measure of central tendency is least affected by skewness in data?

When conducting an Independent Samples T-Test in SPSS, what is the primary dependent variable being analyzed?

What would you use as a summary measure for nominal data?

What is one of the advantages of reporting analyses both with and without outliers?

What should you be cautious of when analyzing data for central tendency?

In the context of analyzing more than two independent groups, which model is appropriate for a continuous dependent variable?

Which of the following is NOT a recommended action when dealing with outliers?

Flashcards

Coding in Surveys

Forced Choice Coding

Missing Values in Surveys

Explicit Missing Value Codes

String Values in Surveys

Outliers

Mode

Boxplot

Independent Samples T-Test

General Linear Model (GLM) - Univariate

Analyze with and without outliers

Median

Mean

Mahalonobis Distance

Bivariate Outlier

Multivariate Outlier

Multivariate Screening

Bivariate Screening

Screening per Variable

Data Screening

Homogeneity Test

Chi-Square (χ2) Test

Goodness-of-Fit Test

Association Between Variables

ANOVA (Analysis of Variance)

Study Notes

Coding and Screening for Surveys

Not So Straightforward Coding

Some Remarks about Missing Values

Screening Example Data

Screening per Variable

Bivariate Screening

Multivariate Screening

Potential Outliers

Handling Outliers

Analyses

More Than Two Independent Groups

Histograms and Split Files

Output ANOVA through GLM, Associations between Variables

x2 Tests

Crosstabs in SPSS

Studying That Suits You

Related Documents

More Like This

Survey Data Collection and Analysis Quiz

Data Coding Process in Surveys

1.1 Survey Tabulation

Youth and Sexual Health Survey Data Analysis