A very intuitive yet powerful inequality in information theory is the data processing inequality.
Lemma: If random variable , and form a Markov chain , then .
The great thing about the inequality is that unlike some results in information theory, it works for both discrete and continuous random variables. (Actually it works even for mixed variables with some continuous and some discrete.) Let’s show it assuming variables are continuous.
Proof:
, where (a) is because and (b) is from .
Note that we can use this simple inequality to show some rather nontrivial result. Consider continuous random variable and a continuous reversible function . Let . Note that in general we have even though we if were discrete. However, for any other continuous random variable , always holds. This can be easily seem noting that both and hold as is reversible. Thus we have both and , and therefore
. One may also prove it with first principle but it is going to be quite a bit harder.