The matrix normal distribution is really cool. The normal distribution is for scalars, the multivariate normal distribution is for vectors, and the matrix normal distribution is for matrices. Discussions of the matrix normal are all at a mathematically higher level than I would like. So here’s a simple characterization of the major way the respective densities of these distributions behave — since we all understand the normal and the multivariate normal, this will help us understand the matrix normal.

The commonality between them is that they are all proportional to exponentials of simple functions, $\exp(-\frac{1}{2}\mathcal{L}(x))$. The main difference between them is that $x$ is either a scalar, vector, or matrix, and $\mathcal{L}(x)$ has a slightly different definition.

Here’s the scalar normal (with mean zero), $\mathcal{L}(x) = a x^2$.
Here’s the vector (i.e. multivariate) normal (also with mean zero), $\mathcal{L}(x) = \sum_i \sum_j a_{ij} x_i x_j$
Here’s the matrix normal (again mean zero), $\mathcal{L}(x) = \sum_i \sum_j \sum_k \sum_l a_{ik} b_{jl} x_{ij} x_{kl}$

Einstein would probably drop the summation signs or something. Wondering where the variances and covariances are? They’re just functions of the $a$‘s and $b$‘s. As always…beware of typos and whatnot.

EDIT: Added the (purely conventional) factor of $-\frac{1}{2}$ to $\exp(-\frac{1}{2}\mathcal{L}(x))$. Also note that there are conditions on the $a$‘s and $b$‘s that are too much to get into in this post (see here for the kind of thing I’m talking about).

UPDATE: This pattern of generalization can continue forever.