Full Transcript

# Lecture 28: Mutual Information ## Motivation * How much does $X$ tell us about $Y$? * How much does $Y$ tell us about $X$? * If $X$ and $Y$ are independent, then knowing $X$ tells us nothing about $Y$ (and vice versa). * If $X = Y$, then knowing $X$ tells us everything about $Y$ (and vic...

# Lecture 28: Mutual Information ## Motivation * How much does $X$ tell us about $Y$? * How much does $Y$ tell us about $X$? * If $X$ and $Y$ are independent, then knowing $X$ tells us nothing about $Y$ (and vice versa). * If $X = Y$, then knowing $X$ tells us everything about $Y$ (and vice versa). ## Definition The **mutual information** between two discrete random variables $X$ and $Y$ is defined as: $$ I(X; Y) = \sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p(x, y) \log \frac{p(x, y)}{p(x)p(y)} $$ where: * $\mathcal{X}$ is the alphabet of $X$. * $\mathcal{Y}$ is the alphabet of $Y$. ## Remarks * $I(X; Y) \geq 0$ (because of log-sum inequality) * $I(X; Y) = I(Y; X)$ * $I(X; Y) = 0$ if and only if $X$ and $Y$ are independent. ## Alternative Expressions $$ \begin{aligned} I(X; Y) &= H(X) - H(X \mid Y) \\ I(X; Y) &= H(Y) - H(Y \mid X) \\ I(X; Y) &= H(X) + H(Y) - H(X, Y) \end{aligned} $$ ## Example 1 Let $X \sim \text{Bern}(1/2)$, and let $Y = f(X)$, where $$ f(x) = \begin{cases} 0, &\text{if } x = 0 \\ 1, &\text{if } x = 1 \end{cases} $$ Then $H(X) = 1$ (bit). Since $Y$ is entirely determined by $X$, $H(Y \mid X) = 0$. Therefore, $$ I(X; Y) = H(Y) - H(Y \mid X) = 1 - 0 = 1 \text{ (bit)} $$ ## Example 2 Let $X \sim \text{Bern}(1/2)$, and let $Y$ be independent of $X$, with $Y \sim \text{Bern}(1/2)$. Then $H(X) = 1$ (bit). Since $X$ and $Y$ are independent, $H(Y \mid X) = H(Y) = 1$. Therefore, $$ I(X; Y) = H(Y) - H(Y \mid X) = 1 - 1 = 0 \text{ (bit)} $$ ## Example 3 Let $(X, Y) \sim p(x, y)$ be given by | p(x, y) | x = 0 | x = 1 | | :------ | :---- | :---- | | y = 0 | 1/8 | 1/16 | | y = 1 | 1/16 | 1/8 | | y = 2 | 1/4 | 1/4 | The marginals are given by $$ \begin{aligned} p(x = 0) &= \frac{1}{8} + \frac{1}{16} + \frac{1}{4} = \frac{7}{16} \\\\ p(x = 1) &= \frac{1}{16} + \frac{1}{8} + \frac{1}{4} = \frac{9}{16} \\\\ p(y = 0) &= \frac{1}{8} + \frac{1}{16} = \frac{3}{16} \\\\ p(y = 1) &= \frac{1}{16} + \frac{1}{8} = \frac{3}{16} \\\\ p(y = 2) &= \frac{1}{4} + \frac{1}{4} = \frac{1}{2} \end{aligned} $$ Therefore, $$ \begin{aligned} I(X; Y) &= \sum_{x \in \{0, 1\}} \sum_{y \in \{0, 1, 2\}} p(x, y) \log \frac{p(x, y)}{p(x) p(y)} \\\\ &= \frac{1}{8} \log \frac{1/8}{(7/16)(3/16)} + \frac{1}{16} \log \frac{1/16}{(7/16)(3/16)} + \frac{1}{4} \log \frac{1/4}{(7/16)(1/2)} \\\\ &+ \frac{1}{16} \log \frac{1/16}{(9/16)(3/16)} + \frac{1}{8} \log \frac{1/8}{(9/16)(3/16)} + \frac{1}{4} \log \frac{1/4}{(9/16)(1/2)} \\\\ &= 0.263 \text{ (bits)} \end{aligned} $$