STT157 Exploratory Data Analysis: Median Polish
18 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the name of the method introduced by John Tukey used in Exploratory Data Analysis?

Median Polish

What kind of table does Median Polish use to extract effects?

Two-way table

Median Polish is robust to outliers because it uses medians instead of means.

True

Median Polish is more robust than ANOVA for identifying the significance of factors in a multifactor model.

<p>True</p> Signup and view all the answers

How is Median Polish used to analyze data and identify the role of each factor?

<p>By iteratively extracting the effects associated with the row and column factors via medians.</p> Signup and view all the answers

Describe the goal of the first step in conducting Median Polish according to John Tukey.

<p>Take the median of each row and subtract it from every value in that row.</p> Signup and view all the answers

What does the overall effect represent in the second step of Median Polish?

<p>The median of the row medians.</p> Signup and view all the answers

What is the name of the formula used to represent the fit for each cell in a two-way table with row 'i' and column 'j'?

<p>fit_ij = commonterm + row_effect_i + column_effect_j</p> Signup and view all the answers

What is the name of the formula used to represent the residual for each cell in a two-way table with row 'i' and column 'j'?

<p>residual_ij = data_ij - fit_ij</p> Signup and view all the answers

What is the name of the parameter used in the medpolish() function that allows you to control the maximum number of iterations?

<p>maxiter</p> Signup and view all the answers

The medpolish() function assumes the data is pre-formatted in a long form.

<p>False</p> Signup and view all the answers

The eda_pol() function requires that the data be in long format.

<p>True</p> Signup and view all the answers

What kind of output does the eda_pol() function provide?

<p>A graphic representation of the polished table.</p> Signup and view all the answers

What is the name of the argument used in the eda_pol() function to sort the output values by their effect size?

<p>sort</p> Signup and view all the answers

What kind of object is the df.pol object in R?

<p>A list</p> Signup and view all the answers

What are the three key output values stored in the df.pol object?

<p>Common, row and column effects</p> Signup and view all the answers

How do you access the global effect stored in the df.pol object?

<p>Using df.pol$global</p> Signup and view all the answers

How do you access the row effects stored in the df.pol object?

<p>Using df.pol$row</p> Signup and view all the answers

Study Notes

STT157 Exploratory Data Analysis (EDA)

  • The course is about exploratory data analysis (EDA)
  • The subject is Median Polish, a data analysis technique

Median Polish

  • Median polish is a method introduced by John Tukey
  • It's a simple and robust technique in EDA
  • It's used to extract effects from a two-way table
  • The method is robust to outliers because it uses medians instead of means for analysis
  • It's more robust compared to ANOVA in a multi-factor model when examining the significance of different factors
  • It iteratively extracts row and column effects to characterize the factors contributing to the expected value.
  • The aim is to pinpoint each factor's role by progressively deducting the row and column effects.

Steps for Conducting Median Polish

  • Take the median of each row and record it next to the row. Subtract the row median from every value in that row.
  • Compute the median of row medians, consider it the overall effect, and subtract this effect from each row median.
  • Take each column's median, record beneath the column, and subtract each column median from the values in that particular column.
  • Compute the column medians' median and add it to the current overall effect. Subtract this new overall effect from the column medians.
  • Repeat steps 1-4 until no changes occur in row or column medians.

Fit for Each Cell

  • The fit for each cell (row i and column j) is equal to a common term plus the row effect (i) plus the column effect(j).
  • Residuals (residualᵢⱼ) are differences between raw data (dataᵢⱼ) and fitted values (fitᵢⱼ). residualᵢⱼ = dataᵢⱼ - fitᵢⱼ

Model

  • The resulting model is additive; the response variable (yᵢⱼ) is equal to the common value (μ) plus the row effect (αᵢ) plus the column effect (βⱼ) plus the residual (εᵢⱼ).

Example (Infant Mortality)

  • Example data presents infant mortality by region and father's education level in United States (1964-1966)
  • Data is reported as the number of deaths per 1000 live births.

Stopping Iteration

  • Iterate through row and column smoothing operations until row and column effect medians approach zero
  • Refrain from unlimited iterations, as suggested by Hoaglin et al. (1983) a few steps are sufficient.

Implementing Median Polish in R

  • R has a built-in function medpolish() to implement median polish.
  • Users can set the maxiter parameter to define the maximum number of iterations. However, by default, R automatically calculates the best number of iterations.
  • The data frame needs to be loaded first. A sample R code is provided
  • The medpolish() function returns a result df.med
  • The console output from the medpolish() function shows the sum of absolute residuals at each step during the iteration.

Using eda_pol

  • The tukeyedar package provides a custom function eda_pol
  • eda_pol creates polished tables as graphic elements, unlike the basic medpolish() function which requires data in long format.
  • Example R code demonstrating how to use eda_pol is provided
  • The df.pol contains values of polished table for the common, row and column effects.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

This quiz covers the concept of Median Polish, a technique in exploratory data analysis introduced by John Tukey. It focuses on how this method robustly extracts effects from two-way tables and the steps involved in conducting the analysis. Test your understanding of its significance and applications in EDA.

More Like This

Median Nerve
10 questions

Median Nerve

MajesticIntegral avatar
MajesticIntegral
Statistics: Finding the Median
5 questions
Krajowa Rada Radiofonii i Telewizji
68 questions

Krajowa Rada Radiofonii i Telewizji

EnergyEfficientMoose5580 avatar
EnergyEfficientMoose5580
Use Quizgecko on...
Browser
Browser