# Jensen's inequality

by Daniel Pollithy

This is an illustrative explanation of Jensen’s inequality applied to probability.

## Jensen’s inequality

Remember that a convex function is a function whose area above the graph (called epi graph) is a convex set. Informally speaking, a convex function can be identified by having positive curvature everywhere. A concave function has the opposite attributes.

Jensen’s inequality states for $\theta \in [0,1]$ and $f$ convex:

$f(\theta x_1 + (1 - \theta) x_2) \le \theta f(x_1) + (1 - \theta) f(x_2)$

The following image illustrates the inequality: This is usable to define and show convexity of functions analogously to the definition of convex sets.

### In probability

If we have a random variable X with probability distribution p(x) over x, then we could set $x_1=x_2=E[X]$. For this case the Jensen’s inequality displays itself as:

$f(E[X]) \le E[f(X)]$ For affine functions f the equality holds. Which tells us that they are concave and convex.

### Derivation of the ELBO

$log p(X)$ is the log evidence, X the observed variables and Z the latent variables. q denotes a variational approximation of p(Z|X):

$log p(X) = log \int_{Z} p(X,Z)$ $= log \int_{Z} p(X,Z) \frac{q(Z)}{q(Z)} = log \int_{Z} \frac{p(X,Z)}{q(Z)} q(Z)$

This is the definition of the expected value wrt q(Z):

$= log(E[\frac{p(X,Z)}{q(Z)}])$

Here comes the inverted Jensen’s inequality in to play (it holds for concave functions). We can confirm ourselves easily that the log() is a concave function by noting that the area under its curve is convex.

$log(E[\frac{p(X,Z)}{q(Z)}]) \ge E[log( \frac{p(X,Z)}{q(Z)} )]$

This can be brought into a nicer form by realizing that the definition of entropy is $H(p) = E_{x\sim p(x)}[log(\frac{1}{p(x)})] = -E_{x\sim p(x)}[log ~ p(X)]$

$E[log( \frac{p(X,Z)}{q(Z)} )] = E[log(p(X,Z)) - log(q(Z))]$ $E[log(p(X,Z)) - log(q(Z))] = E[log(p(X,Z))] - E[log(q(Z))]$ $E[log(p(X,Z))] - E[log(q(Z))] = E[log(p(X,Z))] + H(q(Z))$

### Proof Kullback-Leibler divergence is always >= 0

The KL between p(x) and q(x) is defined as:

$KL(p(x);q(x)) = -E_{x \sim p(x)}[log(\frac{q(x)}{p(x)})]$

We can apply the concave version of Jensen’s inequality. q/p is the distribution and log the concave function.

$-E_{x \sim p(x)}[log(\frac{q(x)}{p(x)})] \le - log(E_{x \sim p(x)}[\frac{q(x)}{p(x)}])$

Multiply by (-1):

$E_{x \sim p(x)}[log(\frac{q(x)}{p(x)})] \ge log(E_{x \sim p(x)}[\frac{q(x)}{p(x)}])$ $log(E_{x \sim p(x)}[\frac{q(x)}{p(x)}]) = log(\int p(x)\frac{q(x)}{p(x)} ~ dx) = log(\int q(x) ~ dx)$

q is a pdf therefore this integral has to be 1. log(1) = 0. As a result:

$E_{x \sim p(x)}[log(\frac{q(x)}{p(x)})] \ge 0$