Podcast
Questions and Answers
Which test would be appropriate for examining the association between two nominal variables?
Which test would be appropriate for examining the association between two nominal variables?
- Spearman correlation
- Pearson correlation
- ANOVA
- Crosstabs with chi-square test (correct)
What should be examined before analyzing correlations between continuous data?
What should be examined before analyzing correlations between continuous data?
- Descriptive statistics of the dataset
- Box plots for outliers
- Scatterplots of the variables (correct)
- Histograms of the data
Which of the following is NOT a requirement for performing a Pearson correlation?
Which of the following is NOT a requirement for performing a Pearson correlation?
- Linear relationship
- Continuous data
- Normality of residuals (correct)
- Observations must be independent
What does the homogeneity test assess in the context of ANOVA?
What does the homogeneity test assess in the context of ANOVA?
What is a critical step to take after using the 'Split file' function in data analysis?
What is a critical step to take after using the 'Split file' function in data analysis?
What is the purpose of dividing a ranking question into smaller problems?
What is the purpose of dividing a ranking question into smaller problems?
Which of the following represents user-missing values in survey data?
Which of the following represents user-missing values in survey data?
In conducting a survey, how should missing values be treated?
In conducting a survey, how should missing values be treated?
What is a characteristic of straightforward coding?
What is a characteristic of straightforward coding?
Which course could be considered the easiest based on a typical coding framework?
Which course could be considered the easiest based on a typical coding framework?
What numerical code is assigned to female respondents in the questionnaire?
What numerical code is assigned to female respondents in the questionnaire?
Which method provides information about categorical variables?
Which method provides information about categorical variables?
What score represents the distance in multivariate space for respondents?
What score represents the distance in multivariate space for respondents?
What could cause an outlier such as a weight of 888 kg in a dataset?
What could cause an outlier such as a weight of 888 kg in a dataset?
What should be examined to identify potential outliers in the dataset?
What should be examined to identify potential outliers in the dataset?
Which regression model specification helps in saving the Mahalanobis distance?
Which regression model specification helps in saving the Mahalanobis distance?
What type of outlier is identified by examining one variable at a time?
What type of outlier is identified by examining one variable at a time?
What is primarily used for analyzing continuous variables?
What is primarily used for analyzing continuous variables?
What is a potential consequence of removing outliers from a sample?
What is a potential consequence of removing outliers from a sample?
Which measure of central tendency is least affected by skewness in data?
Which measure of central tendency is least affected by skewness in data?
When conducting an Independent Samples T-Test in SPSS, what is the primary dependent variable being analyzed?
When conducting an Independent Samples T-Test in SPSS, what is the primary dependent variable being analyzed?
What would you use as a summary measure for nominal data?
What would you use as a summary measure for nominal data?
What is one of the advantages of reporting analyses both with and without outliers?
What is one of the advantages of reporting analyses both with and without outliers?
What should you be cautious of when analyzing data for central tendency?
What should you be cautious of when analyzing data for central tendency?
In the context of analyzing more than two independent groups, which model is appropriate for a continuous dependent variable?
In the context of analyzing more than two independent groups, which model is appropriate for a continuous dependent variable?
Which of the following is NOT a recommended action when dealing with outliers?
Which of the following is NOT a recommended action when dealing with outliers?
Flashcards
Coding in Surveys
Coding in Surveys
Assigning numerical values to responses in a survey, allowing for analysis. This can involve simple categories or more complex scales.
Forced Choice Coding
Forced Choice Coding
The process of simplifying a ranking question by presenting pairs of choices, making it easier for respondents to select their preference.
Missing Values in Surveys
Missing Values in Surveys
Data points where information is unavailable, either due to missing input or the respondent's deliberate choice not to provide it.
Explicit Missing Value Codes
Explicit Missing Value Codes
Signup and view all the flashcards
String Values in Surveys
String Values in Surveys
Signup and view all the flashcards
Outliers
Outliers
Signup and view all the flashcards
Mode
Mode
Signup and view all the flashcards
Boxplot
Boxplot
Signup and view all the flashcards
Independent Samples T-Test
Independent Samples T-Test
Signup and view all the flashcards
General Linear Model (GLM) - Univariate
General Linear Model (GLM) - Univariate
Signup and view all the flashcards
Analyze with and without outliers
Analyze with and without outliers
Signup and view all the flashcards
Median
Median
Signup and view all the flashcards
Mean
Mean
Signup and view all the flashcards
Mahalonobis Distance
Mahalonobis Distance
Signup and view all the flashcards
Bivariate Outlier
Bivariate Outlier
Signup and view all the flashcards
Multivariate Outlier
Multivariate Outlier
Signup and view all the flashcards
Multivariate Screening
Multivariate Screening
Signup and view all the flashcards
Bivariate Screening
Bivariate Screening
Signup and view all the flashcards
Screening per Variable
Screening per Variable
Signup and view all the flashcards
Data Screening
Data Screening
Signup and view all the flashcards
Homogeneity Test
Homogeneity Test
Signup and view all the flashcards
Chi-Square (χ2) Test
Chi-Square (χ2) Test
Signup and view all the flashcards
Goodness-of-Fit Test
Goodness-of-Fit Test
Signup and view all the flashcards
Association Between Variables
Association Between Variables
Signup and view all the flashcards
ANOVA (Analysis of Variance)
ANOVA (Analysis of Variance)
Signup and view all the flashcards
Study Notes
Coding and Screening for Surveys
- Straightforward coding involves assigning clear numerical values for variables like age, gender, and opinions.
- Age is coded as "age in years" in SPSS.
- Gender is coded as 1 = male, 2 = female in SPSS.
- Opinions are coded as 1=strongly disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree in SPSS.
Not So Straightforward Coding
- Coding more complex questions, like ranking courses by difficulty, requires careful design.
- Forced choice format breaks rankings into smaller, pairwise comparisons.
- This allows translating one ranking question into multiple distinct variables.
- Multiple answers are allowed when asking which courses participants liked most from a list.
Some Remarks about Missing Values
- Missing values in surveys can be coded as empty or explicitly defined (system/user missing).
- These codes can differentiate between different types of missing data (e.g., "not applicable").
- Examples of missing value codes include 97 = does not apply, 98 = don't know, and 99 = refused to answer.
- Using numerical codes (e.g., 1 = male, 2 = female) and value labels is recommended for variables like gender.
- Assign a unique identifier to each participant for tracking and data analysis.
Screening Example Data
- Example values for the variables sex, age, anxiety, IQ, married, and income are given in a table.
- Age is in years.
- Anxiety is on a 1-7 ordinal scale.
- Married is the number of years married.
- Income is in 5 categories (1≤ 1500, 2 = 1501-2500,..., 5>5000)
- Other variables, with ranges and descriptive statistics like means and standard deviations, are shown.
Screening per Variable
- Descriptive statistics for several variables are presented in a table, including number of brothers/sisters, number of children, age, education years completed (self, father, mother, spouse), R’s occupation prestige score, and occupational category.
- Data summaries are provided for continuous and categorical variables, including frequency distributions and cross-tables.
- Additional categorical data for respondents' sex and most important problems in the last 12 months (e.g. Finance, Health, Lack of Basic services) is included.
Bivariate Screening
- Bivariate screening checks for unexpected combinations of values in pairs of variables.
- A scatterplot is useful for visualizing relationships between continuous variables and identifying potential outliers.
Multivariate Screening
- Multivariate screening examines multiple variables together to identify outliers.
- Mahalanobis distance calculates the distance between a respondent and the average respondent in a multi-dimensional space.
- High Mahalanobis distances indicate potential outliers.
Potential Outliers
- Examine extreme values in terms of their means to determine if they are significantly different from the other values.
- Look for irregular combinations of values on variables, as this suggests potential outliers
- Scrutinize for data entry errors: A weight of 888 kg is suspicious and should be checked.
- Assess whether respondents are outside of the expected population
- Check if the sample consists of multiple, distinct subgroups.
Handling Outliers
- Strategies for handling outliers include minimizing their influence, transforming their values closer to the mean, or deleting outliers if other factors permit
- Carefully consider whether removing outliers maintains the sample's randomness.
- Always report both the analysis with and without outliers and the rationale behind decisions to help maintain transparency.
Analyses
- Central tendency measures (mean, median, mode) summarize data distributions.
- Histograms, boxplots, and various SPSS analysis options are used to understand and visualize the distributions of variables like age and sex.
- Appropriate statistical tests (e.g., t-tests) have to be selected for investigating differences or comparing characteristics across groups or conditions.
More Than Two Independent Groups
- Analysis can be conducted on more than 2 groups using General Linear Model (GLM) in SPSS, checking assumptions.
- Assumptions include normality of residuals in each subgroup, absence of significant outliers, and equal variances in dependent subgroups.
Histograms and Split Files
- Splitting a file in SPSS lets users analyze and plot data individually for specific subgroups or groups based on categorical variables.
Output ANOVA through GLM, Associations between Variables
- Output from the General Linear Model (GLM) procedure, including ANOVA results and F-tests for analyses with more than two groups, is shown for different categories like respondents' education levels.
- To analyze the relationship between two variables, appropriate methods such as Spearman Correlation for ordinal data, or Pearson correlation and chi-squared test for continuous and nominal data respectively, are used. To prevent misinterpretation, the correlation analysis always begins with the scatterplot examination to check for patterns, outliers, and linearity.
- Significance values from correlation tests represent the probability of obtaining the observed result if there is no true relationship between the variables.
x2 Tests
- Chi-squared (χ²) tests indicate whether observed frequencies in a categorical variable differ from expected frequencies.
- These tests can also verify for independence of paired observations across categories in two different variables.
Crosstabs in SPSS
- SPSS procedure for creating cross-tabulation tables, showing frequencies and percentages within different groups of categorical variables, with example variable options for use.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore the techniques for coding survey data effectively, including straightforward and complex methods. Understand how to handle variables such as age, gender, and opinions, and learn about strategies for addressing missing values in surveys.