Been into the chain rule lately. Something really basic but useful just clicked for me (I’m slow). The (higher-dimensional generalization of the) chain rule is just a matrix product (i.e. just a bunch of inner products)!

The basic chain rule is this:

$\frac{\partial y}{\partial x} = \frac{\partial y}{\partial z} \frac{\partial z}{\partial x}$


The cool thing is that this is the chain rule no matter what the dimensions of $x$, $y$, and $z$ are. So if $y$ is an $n$-dimensional vector and $x$ is an $m$-dimensional vector, then the partial derivative between them is a matrix called the Jacobian matrix,

$\frac{\partial y}{\partial x} = \begin{bmatrix} \dfrac{\partial y_1}{\partial x_1} & \cdots & \dfrac{\partial y_1}{\partial x_m} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial y_n}{\partial x_1} & \cdots & \dfrac{\partial y_n}{\partial x_m} \end{bmatrix}$


So no matter how multidimensional your problem, the rate of change of $y$ with respect to $x$ is just the (matrix) product of the rate of change of $y$ with respect to $z$ and the rate of change of $z$ with respect to $x$. All very linear and lovely.

This is the coolest part for me: because the chain rule is just a bunch of inner products, its closely connected with moments from probability theory (e.g. means, variances, covariances) because these things are also just inner products. Related posts: here and here.