URBS 260 Quantitative Analysis Methods PDF
Document Details
Uploaded by TantalizingToucan
Donny Seto
Tags
Summary
This document is a lecture or presentation on Quantitative Analysis. It covers topics like types of variables, univariate and bivariate analysis, and statistical significance. The document is designed for undergraduates in an urban studies class.
Full Transcript
URBS 260 ANALYTICAL METHODS IN URBAN STUDIES QUANTITATIVE DATA ANALYSIS TEACHER: DONNY SETO This Photo by Unknown author is licensed under CC BY-SA-NC. TODAY’S AGENDA • Research Question Feedback • Intro to Quantitative Methods • Types of Variables • Univariate Analysis • Bivariate Analysis & Sta...
URBS 260 ANALYTICAL METHODS IN URBAN STUDIES QUANTITATIVE DATA ANALYSIS TEACHER: DONNY SETO This Photo by Unknown author is licensed under CC BY-SA-NC. TODAY’S AGENDA • Research Question Feedback • Intro to Quantitative Methods • Types of Variables • Univariate Analysis • Bivariate Analysis & Statistical Significance RESEARCH QUESTIONS FEEDBACK • Posted as a comment on your research questions submission. • Please review and think about reworking your research question as required. INTRODUCTION • The biggest mistake in quantitative research is to think that data analysis decisions can wait until after the data have been collected. • Must be fully aware of what analysis techniques will be used before data collection begins. • Questionnaire, observation schedule, and coding frame should be designed with the data analysis in mind. • The statistical techniques that can be used depend on how a variable is measured. • Inappropriate measurement may make it impossible to conduct certain types of data analysis. • The size and nature of the sample also imposes limitations on the kinds of techniques that are suitable for the data set. TYPES OF VARIABLES Nominal: The only difference that exists between participants is being in one category or another. • Categories cannot be ordered by rank. • Cannot do arithmetic or mathematical operations with the categories. • e.g., gender: categories 'male' and 'female' Ordinal: The categories of the variable can be rank ordered • E.g., high enthusiasm, moderate enthusiasm, low enthusiasm (Likert scale) • Distance or amount of difference between categories may not be equal • Cannot do arithmetic or mathematical operations with the categories Interval/Ratio: Distance or amount of difference between categories is uniform. (e.g. 0 siblings, 1 sibling, 2 siblings, etc.) • Can do arithmetic and mathematical operations with the categories • • (e.g., 1 sibling + 3 siblings = 4 siblings) Ratio variables have a '0' start position UNIVARIATE ANALYSIS • Analysis of one variable at a time be combined as long as they don't overlap • • Often, the first step in the analysis is to create frequency tables for the variables of interest. • Frequency tables show the number of times a particular variable shows up in the population, expressed as an actual number and as percentage of the whole population. • e.g., 36 per cent of participants … (e.g. age groups of 20–29, 30–39, …) • Diagrams can be used to illustrate frequency distributions. • Use bar charts and pie charts, for displaying a nominal or ordinal variable. • Combining categories makes the data more • Use histograms for an interval/ratio variable. manageable and easier to comprehend. • When interval/ratio variables are shown in frequency tables, some of the categories may UNIVARIATE ANALYSIS, CONT’D Source: Bryman, 2019 Source: Bryman, 2019 MEASURES OF CENTRAL TENDENCY An average or typical score for the group • Mode: The score that shows up the most in a particular category • Can be used with all variable types • Most applicable to nominal data • Median: The middle score when all scores have been arrayed in order (if even number of scores it is the mean of two middle scores) • Can be used with ordinal or interval/ratio data • Not as influenced by outliers compared to averages • Mean: sum of all scores, divided by the number of scores • Can be used with interval/ratio data • Vulnerable to outliers (extreme scores) • In excel, the (Average function = Mean) MEASURES OF DISPERSION The amount of variation in a sample • Range: Highest score minus lowest score • Shows the influence of outliers • Standard deviation: Measures the amount of variation around the mean • Influenced by outliers BIVARIATE ANALYSIS & STATISTICAL SIGNIFICANCE These test determines whether there is a relationship between two variables. But, determination of a relationship is not proof of causality. Source: Bryman, 2019 BIVARIATE ANALYSIS, CONT’D CONTINGENCY TABLES (CROSSTABULATIONS) • Allow simultaneous analysis of two variables • Identify patterns of association • Can be used for any variable type • Normally used for nominal or ordinal data • (Note: the independent variable is normally displayed as the column variable) Stage 1 - Contingency Table Condo No Yes Grand Total Bedrooms 1 1 4 5 2 9 6 15 3 11 11 4 18 1 19 5 4 6 2 4 2 Grand Total 45 11 56 PEARSON'S R • Normally used with interval/ratio data • Values from 0 (indicates no relationship) • to +1 (indicates perfect positive relationship) • or -1 (indicates perfect negative relationship) • The relationship between the variables should be approximately linear if Pearson's r is to be used in a study. • This can be established using a scatter plot. KENDALL'S TAU-B & SPEARMAN'S RHO • Kendall's tau-b • Shows correlation between pairs of ordinal variables, or with one ordinal and one interval/ratio variable. • Like Pearson's r , values range from 0 to ±1. • Spearman's rho • Shows correlation between pairs of ordinal variables. • Like Pearsons 'r , values range from 0 to ±1. • Will predict a rank position from one variable to another. CRAMÉR'S V • Shows the strength of the relationship between two nominal variables. • Values range from 0 to 1. • (Nominal categories cannot be rank ordered.) • Usually reported with a contingency table and a chi-square test. COMPARING MEANS AND ETA • Used with an interval/ratio variable and a nominal variable. • Nominal variable is the independent variable. • Compare means of interval variable for each subgroup of the nominal variable. • Determines level of association between the two variables. • Values range from 0 to 1. • (Nominal categories cannot be rank ordered.) AMOUNT OF EXPLAINED VARIANCE • eta, Kendall's tau-b, Spearman's rho, Pearson's r • Squaring shows how much the variation in one variable will explain variation in the other variable. • Allows prediction of the second variable based on the score from the first. STATISTICAL SIGNIFICANCE • Can a sample finding be used to To test for statistical significance estimate a characteristic of the whole (process) population? 1. Set up a null hypothesis. • Stated as a probability level • The probability that the results are not due to chance. • A Null hypotheses tests the significance of the bivariate association. • 2. Establish an acceptable level of significance. • It must be .05 or lower (≤ .05). • (The maximum acceptable in social research) 3. If the null is correct there is no relationship. 4. If the null is rejected and the statistical significance (p) of the findings are ≤ .05 there is indirect support for the research hypothesis. (e.g., state that there is no relationship between two variables, or that two populations do not differ on some characteristic) • It is unlikely that the results occurred by chance. STATISTICAL SIGNIFICANCE TWO TYPES OF ERRORS Type I: rejecting a true null hypothesis • The results are a chance association. Type II: not rejecting a false null hypothesis • The two types of errors act as an inverse relationship to each other, hence they cannot be minimized at the same time. • • If one is low the other is high. Researchers usually choose to minimize the Type I error over the Type II. CORRELATION AND STATISTICAL SIGNIFICANCE The significance of a Pearson's r and a Kendall's tau-b correlation coefficient is determined by • the size of the coefficient; and • the sample size. • Correlation and statistical significance must be weighed together. • As Statistical significance, only speaks to the results not occurring by chance alone and does not speak to the importance of the results. STATISTICAL SIGNIFICANCE: CHI-SQUARE (Χ2) • Used with contingency tables. • Measures the likelihood that a relationship between the two variables exists in the population. • Calculated by comparing the observed frequency in each cell with what would be expected by chance (if there were no relationship between the variables). • The chi-square value is affected by the sample size. STATISTICAL SIGNIFICANCE: COMPARING MEANS • Comparing means and statistical significance • Analysis of variance (F statistic) • Total amount of variation in the dependent variable • Indicates there is a reduced likelihood of no relationship between the set of independent variables and the dependent variable. • Reported as a statistically significant probability (p) Comparing means and statistical significance Compare • the mean of the explained variation (variation between the subgroups in the independent variable) • in relation to means of the error variance (variation within each of the subgroups that make up the independent variable).