跳转至

矩阵微分

梯度

定义:\(\nabla_\textbf xf(\textbf x)=\left[\dfrac{\partial f(\textbf x)}{\partial x_1},\dfrac{\partial f(\textbf x)}{\partial x_2},\cdots,\dfrac{\partial f(\textbf x)}{\partial x_n}\right]^\mathsf{T}\)

性质:

  • \(\nabla_\mathbf x\mathbf {Ax}=\mathbf{A}^\mathsf T\)
  • \(\nabla_\mathbf x\mathbf x^\mathsf T\mathbf A=\mathbf A\)
  • \(\nabla_\mathbf x\mathbf x^\mathsf T\mathbf{Ax}=(\mathbf A+\mathbf A^\mathsf T)\mathbf x\)
  • \(\nabla_\mathbf x\Vert\mathbf x\Vert^2=\nabla_\mathbf x\mathbf x^\mathsf T\mathbf x=2\mathbf x\)
  • \(\nabla_\mathbf X\Vert\mathbf X\Vert^2_F=2\mathbf X\)

标量对向量求导

公式:\(\mathbf x=\begin{bmatrix}x_1\\x_2\\\vdots\\x_n\end{bmatrix}\qquad\dfrac{\partial y}{\partial\mathbf x}=\begin{bmatrix}\dfrac{\partial y}{\partial x_1}\ \dfrac{\partial y}{\partial x_2}\ \cdots\dfrac{\partial y}{\partial x_n}\end{bmatrix}\)

该符号称为分子布局符号

性质:

  • \(\dfrac{\partial (u+v)}{\partial\mathbf x}=\dfrac{\partial u}{\partial\mathbf x}+\dfrac{\partial v}{\partial\mathbf x}\)
  • \(\dfrac{\partial(uv)}{\partial\mathbf x}=\dfrac{\partial u}{\partial\mathbf x}v+\dfrac{\partial v}{\partial\mathbf x}u\)
  • \(\dfrac{\partial\braket\mathbf{u,v}}{\partial\mathbf x}=\mathbf u^\mathsf T\dfrac{\partial\mathbf v}{\partial\mathbf x}+\partial\mathbf v^\mathsf T\dfrac{\partial\mathbf u}{\partial\mathbf x}\)

向量对标量求导

公式:\(\mathbf y=\begin{bmatrix}y_1\\y_2\\\vdots\\y_n\end{bmatrix}\qquad\dfrac{\partial\mathbf y}{\partial x}=\begin{bmatrix}\dfrac{\partial y_1}{\partial x}\\\dfrac{\partial y_2}{\partial x}\\\vdots\\\dfrac{\partial y_m}{\partial x}\end{bmatrix}\)

向量对向量求导

公式:\(\mathbf x=\begin{bmatrix}x_1\\x_2\\\vdots\\x_n\end{bmatrix}\qquad\mathbf y=\begin{bmatrix}y_1\\y_2\\\vdots\\y_n\end{bmatrix}\) \(\dfrac{\partial\mathbf y}{\partial\mathbf x}=\begin{bmatrix}\dfrac{\partial y_1}{\partial\mathbf x}\\\dfrac{\partial y_2}{\partial\mathbf x}\\\vdots\\\dfrac{\partial y_m}{\partial\mathbf x}\end{bmatrix}=\begin{bmatrix}\dfrac{\partial y_1}{\partial x_1}&\dfrac{\partial y_1}{\partial x_2}&\cdots&\dfrac{\partial y_1}{\partial x_n}\\\dfrac{\partial y_2}{\partial x_1}&\dfrac{\partial y_2}{\partial x_2}&\cdots&\dfrac{\partial y_2}{\partial x_n}\\&&\ddots&\\\dfrac{\partial y_m}{\partial x_1}&\dfrac{\partial y_m}{\partial x_2}&\cdots&\dfrac{\partial y_m}{\partial x_n}\end{bmatrix}\)

性质:

  • \(\dfrac{\partial(a\mathbf u)}{\partial\mathbf x}=a\dfrac{\partial\mathbf u}{\partial\mathbf x}\)
  • \(\dfrac{\partial(\mathbf{Au})}{\partial\mathbf x}=\mathbf A\dfrac{\partial\mathbf u}{\partial\mathbf x}\)
  • \(\dfrac{\partial (\mathbf u+\mathbf v)}{\partial\mathbf x}=\dfrac{\partial\mathbf u}{\partial\mathbf x}+\dfrac{\partial\mathbf v}{\partial\mathbf x}\)

总结

image.png 规律:求导后,分母shape不变,分子shape颠倒,尾部1略去