前言
上篇矩阵求导(1)解决了求导时的布局问题,也是矩阵求导最基础的求导方法。现在进入矩阵求导的核心:基本求导法则与基本公式。
基本约定
本篇只涉及标量对向量、矩阵的求导,默认向量是列向量。
标量对向量求导
基本法则
常数求导:
∂ c 0 ∂ x = 0 n × 1 \frac {\partial c_0}{\partial x}=0^{n\times 1} ∂x∂c0=0n×1
常数求导很简单,在此不证明。
线性变换:
∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x = c 1 ∂ f ∂ x + c 2 ∂ g ∂ x \frac {\partial (c_1f(x)+c_2g(x))}{\partial x}=c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x} ∂x∂(c1f(x)+c2g(x))=c1∂x∂f+c2∂x∂g
证明:
∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x = [ ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x 1 ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x 2 … ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x n ] = [ c 1 ∂ ( f ( x ) ) ∂ x 1 c 1 ∂ ( f ( x ) ) ∂ x 2 … c 1 ∂ ( f ( x ) ) ∂ x n ] + [ c 2 ∂ ( g ( x ) ) ∂ x 1 c 2 ∂ ( g ( x ) ) ∂ x 2 … c 2 ∂ ( g ( x ) ) ∂ x n ] = c 1 ∂ f ∂ x + c 2 ∂ g ∂ x \frac {\partial (c_1f(x)+c_2g(x))}{\partial x}= \begin{bmatrix} \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_1}\\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_2}\\ \dots \\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ =\begin{bmatrix} \frac {c_1\partial (f(x))}{\partial x_1}\\ \frac {c_1\partial (f(x))}{\partial x_2}\\ \dots \\ \frac {c_1\partial (f(x))}{\partial x_n} \end{bmatrix} + \begin{bmatrix} \frac {c_2\partial (g(x))}{\partial x_1}\\ \frac {c_2\partial (g(x))}{\partial x_2}\\ \dots \\ \frac {c_2\partial (g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ = c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x} ∂x∂(c1f(x)+c2g(x))=⎣⎢⎢⎢⎡∂x1∂(c1f(x)+c2g(x))∂x2∂(c1f(x)+c2g(x))…∂xn∂(c1f(x)+c2g(x))⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡∂x1c1∂(f(x))∂x2c1∂(f(x))…∂xnc1∂(f(x))⎦⎥⎥⎥⎤+⎣⎢⎢⎢⎡∂x1c2∂(g(x))∂x2c2∂(g(x))…∂xnc2∂(g(x))⎦⎥⎥⎥⎤=c1∂x∂f+c2∂x∂g
加减法就不细说了,和普通函数求导是一样的,也很好证。
乘积:
∂ ( f ( x ) g ( x ) ) ∂ x = ∂ f ( x ) ∂ x g ( x ) + f ( x ) ∂ g ( x ) ∂ x \frac {\partial (f(x)g(x))}{\partial x}= \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x} ∂x∂(f(x)g(x))=∂x∂f(x)g(x)+f(x)∂x∂g(x)
证明:
∂ f ( x ) g ( x ) ∂ x = [ ∂ f g ∂ x 1 ∂ f g ∂ x 2 … ∂ f g ∂ x n ] = [ ∂ f ∂ x 1 g + f ∂ g ∂ x 1 ∂ f ∂ x 2 g + f ∂ g ∂ x 2 … ∂ f ∂ x n g + f ∂ g ∂ x n ] = [ ∂ f ∂ x 1 ∂ f ∂ x 2 … ∂ f ∂ x n ] g + f [ ∂ g ∂ x 1 ∂ g ∂ x 2 … ∂ g ∂ x n ] = ∂ f ( x ) ∂ x g ( x ) + f ( x ) ∂ g ( x ) ∂ x \frac {\partial f(x)g(x)}{\partial x} = \begin{bmatrix} \frac {\partial fg}{\partial x_1} \\ \frac {\partial fg}{\partial x_2} \\ \dots \\ \frac {\partial fg}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1}g+ f\frac {\partial g}{\partial x_1}\\ \frac {\partial f}{\partial x_2}g+ f\frac {\partial g}{\partial x_2}\\ \dots \\ \frac {\partial f}{\partial x_n}g+ f\frac {\partial g}{\partial x_n}\\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1} \\ \frac {\partial f}{\partial x_2} \\ \dots \\ \frac {\partial f}{\partial x_n} \\ \end{bmatrix}g + f\begin{bmatrix} \frac {\partial g}{\partial x_1} \\ \frac {\partial g}{\partial x_2} \\ \dots \\ \frac {\partial g}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x} ∂x∂f(x)g(x)=⎣⎢⎢⎢⎡∂x1∂fg∂x2∂fg…∂xn∂fg⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡∂x1∂fg+f∂x1∂g∂x2