Podcast
Questions and Answers
What is the name of the method introduced by John Tukey used in Exploratory Data Analysis?
What is the name of the method introduced by John Tukey used in Exploratory Data Analysis?
Median Polish
What kind of table does Median Polish use to extract effects?
What kind of table does Median Polish use to extract effects?
Two-way table
Median Polish is robust to outliers because it uses medians instead of means.
Median Polish is robust to outliers because it uses medians instead of means.
True
Median Polish is more robust than ANOVA for identifying the significance of factors in a multifactor model.
Median Polish is more robust than ANOVA for identifying the significance of factors in a multifactor model.
Signup and view all the answers
How is Median Polish used to analyze data and identify the role of each factor?
How is Median Polish used to analyze data and identify the role of each factor?
Signup and view all the answers
Describe the goal of the first step in conducting Median Polish according to John Tukey.
Describe the goal of the first step in conducting Median Polish according to John Tukey.
Signup and view all the answers
What does the overall effect represent in the second step of Median Polish?
What does the overall effect represent in the second step of Median Polish?
Signup and view all the answers
What is the name of the formula used to represent the fit for each cell in a two-way table with row 'i' and column 'j'?
What is the name of the formula used to represent the fit for each cell in a two-way table with row 'i' and column 'j'?
Signup and view all the answers
What is the name of the formula used to represent the residual for each cell in a two-way table with row 'i' and column 'j'?
What is the name of the formula used to represent the residual for each cell in a two-way table with row 'i' and column 'j'?
Signup and view all the answers
What is the name of the parameter used in the medpolish() function that allows you to control the maximum number of iterations?
What is the name of the parameter used in the medpolish() function that allows you to control the maximum number of iterations?
Signup and view all the answers
The medpolish() function assumes the data is pre-formatted in a long form.
The medpolish() function assumes the data is pre-formatted in a long form.
Signup and view all the answers
The eda_pol() function requires that the data be in long format.
The eda_pol() function requires that the data be in long format.
Signup and view all the answers
What kind of output does the eda_pol() function provide?
What kind of output does the eda_pol() function provide?
Signup and view all the answers
What is the name of the argument used in the eda_pol() function to sort the output values by their effect size?
What is the name of the argument used in the eda_pol() function to sort the output values by their effect size?
Signup and view all the answers
What kind of object is the df.pol object in R?
What kind of object is the df.pol object in R?
Signup and view all the answers
What are the three key output values stored in the df.pol object?
What are the three key output values stored in the df.pol object?
Signup and view all the answers
How do you access the global effect stored in the df.pol object?
How do you access the global effect stored in the df.pol object?
Signup and view all the answers
How do you access the row effects stored in the df.pol object?
How do you access the row effects stored in the df.pol object?
Signup and view all the answers
Study Notes
STT157 Exploratory Data Analysis (EDA)
- The course is about exploratory data analysis (EDA)
- The subject is Median Polish, a data analysis technique
Median Polish
- Median polish is a method introduced by John Tukey
- It's a simple and robust technique in EDA
- It's used to extract effects from a two-way table
- The method is robust to outliers because it uses medians instead of means for analysis
- It's more robust compared to ANOVA in a multi-factor model when examining the significance of different factors
- It iteratively extracts row and column effects to characterize the factors contributing to the expected value.
- The aim is to pinpoint each factor's role by progressively deducting the row and column effects.
Steps for Conducting Median Polish
- Take the median of each row and record it next to the row. Subtract the row median from every value in that row.
- Compute the median of row medians, consider it the overall effect, and subtract this effect from each row median.
- Take each column's median, record beneath the column, and subtract each column median from the values in that particular column.
- Compute the column medians' median and add it to the current overall effect. Subtract this new overall effect from the column medians.
- Repeat steps 1-4 until no changes occur in row or column medians.
Fit for Each Cell
- The fit for each cell (row i and column j) is equal to a common term plus the row effect (i) plus the column effect(j).
- Residuals (residualᵢⱼ) are differences between raw data (dataᵢⱼ) and fitted values (fitᵢⱼ). residualᵢⱼ = dataᵢⱼ - fitᵢⱼ
Model
- The resulting model is additive; the response variable (yᵢⱼ) is equal to the common value (μ) plus the row effect (αᵢ) plus the column effect (βⱼ) plus the residual (εᵢⱼ).
Example (Infant Mortality)
- Example data presents infant mortality by region and father's education level in United States (1964-1966)
- Data is reported as the number of deaths per 1000 live births.
Stopping Iteration
- Iterate through row and column smoothing operations until row and column effect medians approach zero
- Refrain from unlimited iterations, as suggested by Hoaglin et al. (1983) a few steps are sufficient.
Implementing Median Polish in R
- R has a built-in function
medpolish()
to implement median polish. - Users can set the
maxiter
parameter to define the maximum number of iterations. However, by default, R automatically calculates the best number of iterations. - The data frame needs to be loaded first. A sample
R
code is provided - The
medpolish()
function returns a resultdf.med
- The console output from the
medpolish()
function shows the sum of absolute residuals at each step during the iteration.
Using eda_pol
- The
tukeyedar
package provides a custom functioneda_pol
-
eda_pol
creates polished tables as graphic elements, unlike the basicmedpolish()
function which requires data in long format. - Example
R
code demonstrating how to useeda_pol
is provided - The
df.pol
contains values of polished table for the common, row and column effects.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz covers the concept of Median Polish, a technique in exploratory data analysis introduced by John Tukey. It focuses on how this method robustly extracts effects from two-way tables and the steps involved in conducting the analysis. Test your understanding of its significance and applications in EDA.