Just a quick cheatsheet on derivatives (of scalars and vectors) wrt of a vector. This is borrowed from the wiki page : Matrix Calculus.

Usually, in print following notations are in use:

**A** : Matrix (capital and bold)

**b** : Vector (small and bold)

c : scalar (small and not bold)

## The rules for derivatives of a scalar by a vector

## The rules for derivatives of a vector by a vector

Using all these rules, I have derived the commonly used least squares equation. The error function is a scalar. The optimization variable (the unknown is a vector). We are trying to minimize the error, hence we need to take the gradient (vector derivative) of error with respect to the optimization variable( the unknown) and set it to zero-vector. The values that make the gradient zeros will also be optimal value for the error (here minimum).

Hope this helps!