STT157 Exploratory Data Analysis: Median Polish

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the name of the method introduced by John Tukey used in Exploratory Data Analysis?

Median Polish

What kind of table does Median Polish use to extract effects?

Two-way table

Median Polish is robust to outliers because it uses medians instead of means.

True (A)

Median Polish is more robust than ANOVA for identifying the significance of factors in a multifactor model.

<p>True (A)</p> Signup and view all the answers

How is Median Polish used to analyze data and identify the role of each factor?

<p>By iteratively extracting the effects associated with the row and column factors via medians.</p> Signup and view all the answers

Describe the goal of the first step in conducting Median Polish according to John Tukey.

<p>Take the median of each row and subtract it from every value in that row.</p> Signup and view all the answers

What does the overall effect represent in the second step of Median Polish?

<p>The median of the row medians.</p> Signup and view all the answers

What is the name of the formula used to represent the fit for each cell in a two-way table with row 'i' and column 'j'?

<p>fit_ij = commonterm + row_effect_i + column_effect_j</p> Signup and view all the answers

What is the name of the formula used to represent the residual for each cell in a two-way table with row 'i' and column 'j'?

<p>residual_ij = data_ij - fit_ij</p> Signup and view all the answers

What is the name of the parameter used in the medpolish() function that allows you to control the maximum number of iterations?

<p>maxiter</p> Signup and view all the answers

The medpolish() function assumes the data is pre-formatted in a long form.

<p>False (B)</p> Signup and view all the answers

The eda_pol() function requires that the data be in long format.

<p>True (A)</p> Signup and view all the answers

What kind of output does the eda_pol() function provide?

<p>A graphic representation of the polished table.</p> Signup and view all the answers

What is the name of the argument used in the eda_pol() function to sort the output values by their effect size?

<p>sort</p> Signup and view all the answers

What kind of object is the df.pol object in R?

<p>A list</p> Signup and view all the answers

What are the three key output values stored in the df.pol object?

<p>Common, row and column effects</p> Signup and view all the answers

How do you access the global effect stored in the df.pol object?

<p>Using df.pol$global</p> Signup and view all the answers

How do you access the row effects stored in the df.pol object?

<p>Using df.pol$row</p> Signup and view all the answers

Flashcards

What is Median Polish?

Median polish is a method for analyzing two-way tables that helps identify the effect of row and column factors on a response variable.

How does median polish work?

The iterative process of median polish involves repeatedly subtracting row and column medians and recalculating medians until they converge to a stable value.

What is a 'row effect' in median polish?

The row effect represents the influence of a particular row on the response variable. For example, a positive row effect might indicate a higher response value for that row.

What is a 'column effect' in median polish?

The column effect represents the influence of a particular column on the response variable.

Signup and view all the flashcards

What is a 'residual' in median polish?

The residual value represents the difference between the observed data and the fitted value based on row and column effects.

Signup and view all the flashcards

What is the 'common value' in median polish?

The common value (μ) in median polish represents the overall baseline level of the response variable, based on the grand median calculated during the iterative process.

Signup and view all the flashcards

What makes median polish robust?

Median polish is particularly useful when the data contains outliers, as it uses medians rather than means, which are more sensitive to extreme values.

Signup and view all the flashcards

When does median polish stop?

In the final iteration of median polish, the row and column effect medians should be close to zero, indicating that the row and column effects have been effectively extracted.

Signup and view all the flashcards

What is the goal of median polish?

The goal of median polish is to understand how the effects of row and column factors contribute to the variation in the response variable.

Signup and view all the flashcards

How is a table cell represented in median polish?

Each cell in the table can be modeled as the sum of the common term, the row effect, the column effect, and the residual value.

Signup and view all the flashcards

What can we learn from median polish?

By analyzing the row and column effects, you can identify which factors have the most significant impact on the response variable.

Signup and view all the flashcards

How can median polish be implemented in R?

Median polish can be implemented in R using the 'medpolish()' function. By setting the 'maxiter' parameter, you can control the maximum number of iterations.

Signup and view all the flashcards

Row effect

The row effect is the amount a particular row deviates from the common value.

Signup and view all the flashcards

Column effect

The column effect is the amount a particular column deviates from the common value.

Signup and view all the flashcards

Two-way table

A two-way table can be used to represent the relationship between a response variable and two categorical factors.

Signup and view all the flashcards

Residual value

The residual value measures how well the fitted model captures the real data.

Signup and view all the flashcards

Common term

The common term in median polish represents the average effect across all rows and columns.

Signup and view all the flashcards

Robustness of Median Polish

Median polish is a robust technique for data analysis, meaning it's less likely to be affected by outliers.

Signup and view all the flashcards

Purpose of Median Polish

Median polish helps us understand how row and column factors influence the response variable within a two-way table.

Signup and view all the flashcards

R function for Median Polish

The 'medpolish()' function in R provides an efficient way to perform median polish analysis.

Signup and view all the flashcards

Controlling Iterations in Median Polish

The 'maxiter' parameter in the 'medpolish()' function allows you to set the maximum number of iterations for the analysis.

Signup and view all the flashcards

Applications of Median Polish

Median polish is a useful tool for analyzing data in two-way tables, especially when outliers are present.

Signup and view all the flashcards

Components of Median Polish Analysis

The common term, row effects, column effects, and residual values together provide a comprehensive breakdown of the response variable within a two-way table.

Signup and view all the flashcards

Interpreting Median Polish Results

By analyzing the results of median polish, you can gain valuable insights into the relationships between variables in a two-way table.

Signup and view all the flashcards

Importance of Understanding Median Polish

Understanding median polish empowers data analysts to effectively analyze and interpret data from two-way tables, leading to more informed decisions.

Signup and view all the flashcards

Fields of Application for Median Polish

Median polish can be used in a variety of fields where two-way tables are common, such as healthcare, economics, and social sciences.

Signup and view all the flashcards

Benefits of Using Median Polish

By using median polish, you can gain a deeper understanding of the relationships between variables in two-way tables, leading to more informed and insightful conclusions.

Signup and view all the flashcards

Study Notes

STT157 Exploratory Data Analysis (EDA)

  • The course is about exploratory data analysis (EDA)
  • The subject is Median Polish, a data analysis technique

Median Polish

  • Median polish is a method introduced by John Tukey
  • It's a simple and robust technique in EDA
  • It's used to extract effects from a two-way table
  • The method is robust to outliers because it uses medians instead of means for analysis
  • It's more robust compared to ANOVA in a multi-factor model when examining the significance of different factors
  • It iteratively extracts row and column effects to characterize the factors contributing to the expected value.
  • The aim is to pinpoint each factor's role by progressively deducting the row and column effects.

Steps for Conducting Median Polish

  • Take the median of each row and record it next to the row. Subtract the row median from every value in that row.
  • Compute the median of row medians, consider it the overall effect, and subtract this effect from each row median.
  • Take each column's median, record beneath the column, and subtract each column median from the values in that particular column.
  • Compute the column medians' median and add it to the current overall effect. Subtract this new overall effect from the column medians.
  • Repeat steps 1-4 until no changes occur in row or column medians.

Fit for Each Cell

  • The fit for each cell (row i and column j) is equal to a common term plus the row effect (i) plus the column effect(j).
  • Residuals (residualᵢⱼ) are differences between raw data (dataᵢⱼ) and fitted values (fitᵢⱼ). residualᵢⱼ = dataᵢⱼ - fitᵢⱼ

Model

  • The resulting model is additive; the response variable (yᵢⱼ) is equal to the common value (μ) plus the row effect (αᵢ) plus the column effect (βⱼ) plus the residual (εᵢⱼ).

Example (Infant Mortality)

  • Example data presents infant mortality by region and father's education level in United States (1964-1966)
  • Data is reported as the number of deaths per 1000 live births.

Stopping Iteration

  • Iterate through row and column smoothing operations until row and column effect medians approach zero
  • Refrain from unlimited iterations, as suggested by Hoaglin et al. (1983) a few steps are sufficient.

Implementing Median Polish in R

  • R has a built-in function medpolish() to implement median polish.
  • Users can set the maxiter parameter to define the maximum number of iterations. However, by default, R automatically calculates the best number of iterations.
  • The data frame needs to be loaded first. A sample R code is provided
  • The medpolish() function returns a result df.med
  • The console output from the medpolish() function shows the sum of absolute residuals at each step during the iteration.

Using eda_pol

  • The tukeyedar package provides a custom function eda_pol
  • eda_pol creates polished tables as graphic elements, unlike the basic medpolish() function which requires data in long format.
  • Example R code demonstrating how to use eda_pol is provided
  • The df.pol contains values of polished table for the common, row and column effects.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Median Nerve
10 questions

Median Nerve

MajesticIntegral avatar
MajesticIntegral
Anatomie Du Nerf Médian
10 questions
Krajowa Rada Radiofonii i Telewizji
68 questions

Krajowa Rada Radiofonii i Telewizji

EnergyEfficientMoose5580 avatar
EnergyEfficientMoose5580
Use Quizgecko on...
Browser
Browser