2.2 Joint Entropy and Conditional EntropyΒΆ
Definition. The joint entropy \(H(X, Y)\) of a pair of discrete random variables \((X, Y)\) with a joint distribution \(p(x, y)\) is defined as
\[H(X, Y) = - \sum_{x \in \mathcal{X}} \sum_{y \in \mathcal{Y}} p(x, y) \log p(x, y) = - E \log p(X, Y)\]
Definition. If \((X, Y) \sim p(x, y)\), the conditional entropy \(H(Y \mid X)\) is defined as
\[\begin{split}H(Y \mid X) & = \sum_{x \in \mathcal{X}} p(x) H(Y \mid X = x) \\
& = - \sum_{x \in \mathcal{X}} p(x) \sum_{y \in \mathcal{Y}} p(y \mid x) \log p(y \mid x) \\
& = - \sum_{x \in \mathcal{X}}\sum_{y \in \mathcal{Y}} p(x, y) \log p(y \mid x) \\
& = - E \log p(Y \mid X)\end{split}\]
Theorem 2.2.1 (Chain Rule). \(H(X, Y) = H(X) + H(Y \mid X)\).
Corollary. \(H(X, Y \mid X) = H(X \mid X) + H(Y \mid X, Z)\).
Remark. Note that
\[\begin{split}& H(Y \mid X) \neq H(X \mid Y) \\
& H(X) - H(X \mid Y) = H(Y) - H(Y \mid X)\end{split}\]