Just a quick cheatsheet on derivatives (of scalars and vectors) wrt of a vector. This is borrowed from the wiki page : Matrix Calculus.
Usually, in print following notations are in use:
A : Matrix (capital and bold)
b : Vector (small and bold)
c : scalar (small and not bold)
The rules for derivatives of a scalar by a vector
The rules for derivatives of a vector by a vector
Using all these rules, I have derived the commonly used least squares equation. The error function is a scalar. The optimization variable (the unknown is a vector). We are trying to minimize the error, hence we need to take the gradient (vector derivative) of error with respect to the optimization variable( the unknown) and set it to zero-vector. The values that make the gradient zeros will also be optimal value for the error (here minimum).
Hope this helps!