# Still lovin’ the chain rule…

Been into the chain rule lately. Something really basic but useful just clicked for me (I’m slow). The (higher-dimensional generalization of the) chain rule is just a matrix product (i.e. just a bunch of inner products)!

The basic chain rule is this:

The cool thing is that this is the chain rule no matter what the dimensions of , , and are. So if is an -dimensional vector and is an -dimensional vector, then the partial derivative between them is a matrix called the Jacobian matrix,

So no matter how multidimensional your problem, the rate of change of with respect to is just the (matrix) product of the rate of change of with respect to and the rate of change of with respect to . All very linear and lovely.

This is the coolest part for me: because the chain rule is just a bunch of inner products, its closely connected with moments from probability theory (e.g. means, variances, covariances) because these things are also just inner products. Related posts: here and here.