Survey Data Coding Techniques
26 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which test would be appropriate for examining the association between two nominal variables?

  • Spearman correlation
  • Pearson correlation
  • ANOVA
  • Crosstabs with chi-square test (correct)

What should be examined before analyzing correlations between continuous data?

  • Descriptive statistics of the dataset
  • Box plots for outliers
  • Scatterplots of the variables (correct)
  • Histograms of the data

Which of the following is NOT a requirement for performing a Pearson correlation?

  • Linear relationship
  • Continuous data
  • Normality of residuals (correct)
  • Observations must be independent

What does the homogeneity test assess in the context of ANOVA?

<p>Equal variances among groups (A)</p> Signup and view all the answers

What is a critical step to take after using the 'Split file' function in data analysis?

<p>Run the analysis without groups (D)</p> Signup and view all the answers

What is the purpose of dividing a ranking question into smaller problems?

<p>To translate a single question into several distinct variables (C)</p> Signup and view all the answers

Which of the following represents user-missing values in survey data?

<p>Values defined as 99 = refused to answer (A), Values defined as 97 = does not apply (C)</p> Signup and view all the answers

In conducting a survey, how should missing values be treated?

<p>They can be left empty or defined explicitly (D)</p> Signup and view all the answers

What is a characteristic of straightforward coding?

<p>It simplifies the coding process for quantitative data (B)</p> Signup and view all the answers

Which course could be considered the easiest based on a typical coding framework?

<p>Health psychology (D)</p> Signup and view all the answers

What numerical code is assigned to female respondents in the questionnaire?

<p>2 (D)</p> Signup and view all the answers

Which method provides information about categorical variables?

<p>Frequency distributions (A)</p> Signup and view all the answers

What score represents the distance in multivariate space for respondents?

<p>Mahalanobis distance (D)</p> Signup and view all the answers

What could cause an outlier such as a weight of 888 kg in a dataset?

<p>Incorrect data entry (D)</p> Signup and view all the answers

What should be examined to identify potential outliers in the dataset?

<p>Score patterns of suspicious individuals (B)</p> Signup and view all the answers

Which regression model specification helps in saving the Mahalanobis distance?

<p>All variables as predictors (D)</p> Signup and view all the answers

What type of outlier is identified by examining one variable at a time?

<p>Univariate outlier (B)</p> Signup and view all the answers

What is primarily used for analyzing continuous variables?

<p>Descriptive statistics (D)</p> Signup and view all the answers

What is a potential consequence of removing outliers from a sample?

<p>The results may significantly differ if outliers are excluded. (B), The statistical analysis can be affected by the randomization of the sample. (D)</p> Signup and view all the answers

Which measure of central tendency is least affected by skewness in data?

<p>Median (B)</p> Signup and view all the answers

When conducting an Independent Samples T-Test in SPSS, what is the primary dependent variable being analyzed?

<p>Age (D)</p> Signup and view all the answers

What would you use as a summary measure for nominal data?

<p>Mode (B)</p> Signup and view all the answers

What is one of the advantages of reporting analyses both with and without outliers?

<p>It allows for a comprehensive understanding of data impact. (C)</p> Signup and view all the answers

What should you be cautious of when analyzing data for central tendency?

<p>Assuming a normal distribution in all cases. (A), Relying solely on the mode for continuous data. (B), Using the mean for categorical data. (D)</p> Signup and view all the answers

In the context of analyzing more than two independent groups, which model is appropriate for a continuous dependent variable?

<p>General Linear Model (B)</p> Signup and view all the answers

Which of the following is NOT a recommended action when dealing with outliers?

<p>Delete extreme cases to ensure randomness. (D)</p> Signup and view all the answers

Flashcards

Coding in Surveys

Assigning numerical values to responses in a survey, allowing for analysis. This can involve simple categories or more complex scales.

Forced Choice Coding

The process of simplifying a ranking question by presenting pairs of choices, making it easier for respondents to select their preference.

Missing Values in Surveys

Data points where information is unavailable, either due to missing input or the respondent's deliberate choice not to provide it.

Explicit Missing Value Codes

Specifying a specific code to represent a particular reason for a missing value, such as 'not applicable' or 'don't know'.

Signup and view all the flashcards

String Values in Surveys

Using text-based values for variables, such as 'male' or 'female', which can be analyzed alongside numerical data.

Signup and view all the flashcards

Outliers

Values that are significantly different from the majority of data points in a dataset. They can be unusually high or low.

Signup and view all the flashcards

Mode

A statistical measure that summarizes the most frequent value in a dataset.

Signup and view all the flashcards

Boxplot

A graphical representation of data that displays the distribution of a variable. Boxplots show the median, quartiles, and potential outliers.

Signup and view all the flashcards

Independent Samples T-Test

A statistical test used to compare the means of two independent groups. It is used to determine if there is a significant difference between the groups.

Signup and view all the flashcards

General Linear Model (GLM) - Univariate

A statistical technique used to analyze data when there are more than two groups. It's often used for comparing means across different groups.

Signup and view all the flashcards

Analyze with and without outliers

Data points that are not analyzed or removed from the dataset. It can be used to compare or analyze data with and w/o outliers.

Signup and view all the flashcards

Median

A measure of central tendency that represents the middle value in a sorted dataset. It's not influenced by extreme values.

Signup and view all the flashcards

Mean

A statistical measure that represents the average of a dataset. It's sensitive to extreme values.

Signup and view all the flashcards

Mahalonobis Distance

A score for each respondent that represents their distance in a multivariate space from the average respondent.

Signup and view all the flashcards

Bivariate Outlier

A type of outlier that occurs when a data point deviates significantly from the pattern of the other points in a scatterplot.

Signup and view all the flashcards

Multivariate Outlier

A type of outlier that occurs when a data point deviates significantly from the pattern of the other points in a multidimensional space.

Signup and view all the flashcards

Multivariate Screening

A technique to identify outliers by calculating the Mahalanobis distance for each respondent, allowing researchers to examine the distribution of these distances to pinpoint potential outliers.

Signup and view all the flashcards

Bivariate Screening

A technique used to analyze and identify anomalous combinations of values across multiple variables.

Signup and view all the flashcards

Screening per Variable

The process of examining individual variables to identify any potential issues or anomalies in the data.

Signup and view all the flashcards

Data Screening

The initial step in data analysis, aimed at ensuring the quality and consistency of collected data. It involves checking for errors, inconsistencies, and missing values.

Signup and view all the flashcards

Homogeneity Test

A statistical test used to check if the variances within different groups are equal. This is important for ANOVA, as it assumes equal variances across groups for accurate results.

Signup and view all the flashcards

Chi-Square (χ2) Test

A statistical test used to determine if there is a significant difference between observed frequencies and expected frequencies in a data set. It's often used for association between categorical variables.

Signup and view all the flashcards

Goodness-of-Fit Test

A statistical test used to determine if an observed frequency distribution is significantly different from a theoretical distribution. This helps determine how well the data aligns with the expected pattern.

Signup and view all the flashcards

Association Between Variables

A type of analysis that checks if there's a relationship between two variables. It's helpful for identifying if two factors are related to one another.

Signup and view all the flashcards

ANOVA (Analysis of Variance)

A statistical test used to determine if there is a statistically significant relationship between two or more variables. If the p-value is less than the significance level, the null hypothesis is rejected, and the alternative hypothesis is supported.

Signup and view all the flashcards

Study Notes

Coding and Screening for Surveys

  • Straightforward coding involves assigning clear numerical values for variables like age, gender, and opinions.
  • Age is coded as "age in years" in SPSS.
  • Gender is coded as 1 = male, 2 = female in SPSS.
  • Opinions are coded as 1=strongly disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree in SPSS.

Not So Straightforward Coding

  • Coding more complex questions, like ranking courses by difficulty, requires careful design.
  • Forced choice format breaks rankings into smaller, pairwise comparisons.
  • This allows translating one ranking question into multiple distinct variables.
  • Multiple answers are allowed when asking which courses participants liked most from a list.

Some Remarks about Missing Values

  • Missing values in surveys can be coded as empty or explicitly defined (system/user missing).
  • These codes can differentiate between different types of missing data (e.g., "not applicable").
  • Examples of missing value codes include 97 = does not apply, 98 = don't know, and 99 = refused to answer.
  • Using numerical codes (e.g., 1 = male, 2 = female) and value labels is recommended for variables like gender.
  • Assign a unique identifier to each participant for tracking and data analysis.

Screening Example Data

  • Example values for the variables sex, age, anxiety, IQ, married, and income are given in a table.
  • Age is in years.
  • Anxiety is on a 1-7 ordinal scale.
  • Married is the number of years married.
  • Income is in 5 categories (1≤ 1500, 2 = 1501-2500,..., 5>5000)
  • Other variables, with ranges and descriptive statistics like means and standard deviations, are shown.

Screening per Variable

  • Descriptive statistics for several variables are presented in a table, including number of brothers/sisters, number of children, age, education years completed (self, father, mother, spouse), R’s occupation prestige score, and occupational category.
  • Data summaries are provided for continuous and categorical variables, including frequency distributions and cross-tables.
  • Additional categorical data for respondents' sex and most important problems in the last 12 months (e.g. Finance, Health, Lack of Basic services) is included.

Bivariate Screening

  • Bivariate screening checks for unexpected combinations of values in pairs of variables.
  • A scatterplot is useful for visualizing relationships between continuous variables and identifying potential outliers.

Multivariate Screening

  • Multivariate screening examines multiple variables together to identify outliers.
  • Mahalanobis distance calculates the distance between a respondent and the average respondent in a multi-dimensional space.
  • High Mahalanobis distances indicate potential outliers.

Potential Outliers

  • Examine extreme values in terms of their means to determine if they are significantly different from the other values.
  • Look for irregular combinations of values on variables, as this suggests potential outliers
  • Scrutinize for data entry errors: A weight of 888 kg is suspicious and should be checked.
  • Assess whether respondents are outside of the expected population
  • Check if the sample consists of multiple, distinct subgroups.

Handling Outliers

  • Strategies for handling outliers include minimizing their influence, transforming their values closer to the mean, or deleting outliers if other factors permit
  • Carefully consider whether removing outliers maintains the sample's randomness.
  • Always report both the analysis with and without outliers and the rationale behind decisions to help maintain transparency.

Analyses

  • Central tendency measures (mean, median, mode) summarize data distributions.
  • Histograms, boxplots, and various SPSS analysis options are used to understand and visualize the distributions of variables like age and sex.
  • Appropriate statistical tests (e.g., t-tests) have to be selected for investigating differences or comparing characteristics across groups or conditions.

More Than Two Independent Groups

  • Analysis can be conducted on more than 2 groups using General Linear Model (GLM) in SPSS, checking assumptions.
  • Assumptions include normality of residuals in each subgroup, absence of significant outliers, and equal variances in dependent subgroups.

Histograms and Split Files

  • Splitting a file in SPSS lets users analyze and plot data individually for specific subgroups or groups based on categorical variables.

Output ANOVA through GLM, Associations between Variables

  • Output from the General Linear Model (GLM) procedure, including ANOVA results and F-tests for analyses with more than two groups, is shown for different categories like respondents' education levels.
  • To analyze the relationship between two variables, appropriate methods such as Spearman Correlation for ordinal data, or Pearson correlation and chi-squared test for continuous and nominal data respectively, are used. To prevent misinterpretation, the correlation analysis always begins with the scatterplot examination to check for patterns, outliers, and linearity.
  • Significance values from correlation tests represent the probability of obtaining the observed result if there is no true relationship between the variables.

x2 Tests

  • Chi-squared (χ²) tests indicate whether observed frequencies in a categorical variable differ from expected frequencies.
  • These tests can also verify for independence of paired observations across categories in two different variables.

Crosstabs in SPSS

  • SPSS procedure for creating cross-tabulation tables, showing frequencies and percentages within different groups of categorical variables, with example variable options for use.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Lecture 5 2024 PDF

Description

Explore the techniques for coding survey data effectively, including straightforward and complex methods. Understand how to handle variables such as age, gender, and opinions, and learn about strategies for addressing missing values in surveys.

More Like This

Use Quizgecko on...
Browser
Browser