Optimal Transport
It's hard to imagine there's a distance between two distinct probability distributions. It is theoretically possible to flow from one distribution to the other through a transportation. Optimal transport in this context, simply means finding the shortest distance in such transportation that reaches the other distribution. These measurement of distances are often refer to divergence.
Divergence
Kullback-Lieibler Divergence
\[
D_{KL}(P||Q) = \int{ p(x)\ log\left(\frac{p(x)}{q(x)}\right) } dx
\]
You may also think of \(P\) and \(Q\) are probability measures on a measurable space where \(x\in\mathcal{X}\). Then the divergence is said to be the relative entropy from \(Q\) to \(P\).