Skip to main content

矩阵

矩阵的乘积

矩阵相乘的理解

矩阵是线性空间中的线性变换的一个描述。在一个线性空间中,只要我们选定一组基,那么对于任何一个线性变换,都能够用一个确定的矩阵来加以描述

左乘矩阵是进行行操作,右乘矩阵是进行列操作。

C=A×B\mathbf{C} = \mathbf{A}\times\mathbf{B}中的B\mathbf{B}的列向量可以看作是以A\mathbf{A}的列向量为基的子空间坐标。

Hadamard哈达玛积(矩阵点乘)(Hadamard Product)

哈达玛积就是两个矩阵对应位置的元素相乘,布局不变。俗称矩阵点乘,符号是空心圆 ∘,两个矩阵的形状必须一样。

矩阵内积 (Iner Product of Matrices)

符号:⟨ . , . ⟩ 目的:度量长度。 定义:列向量a\mathbf{a}与行向量b\mathbf{b}的内积是指:组成a的第\mathbf{a}的第一个元素与组成b\mathbf{b}的第一个元素的乘积,依次,m个这样的乘积的加和。例如,

<ab>=a1b1+a2b2<\mathbf{a},\mathbf{b}>=a_{1}b_{1}+a_{2}b_{2}​

矩阵A\mathbf{A}与矩阵B\mathbf{B}的内积是指:组成A\mathbf{A}的第一个向量与组成B\mathbf{B}的第一个向量的内积,依次,m个这样的内积的加和。

<AB>=i=1nj=1naijbij<\mathbf{A},\mathbf{B}>=\sum^n_{i=1}\sum^n_{j=1}a_{ij}*b_{ij}

克罗内克积(Kronecker Product )

符号:⊗ LaTex: \otimes 定义:克罗内克积是两个任意大小的矩阵间的运算,它是张量积的特殊形式。给定A\boldsymbol{A}B\boldsymbol{B},则A\boldsymbol{A}B\boldsymbol{B}的克罗内克积是一个在空间Rmp×nq\mathbb{R}^{\mathrm{mp}\times\mathrm{nq}}的分块矩阵:

AB=[a11Ba1nBam1BamnB]\left.A\otimes B=\left[\begin{array}{ccc}\mathrm{a}_{11}B&\cdots&\mathrm{a}_{1\mathrm{n}}B\\ \vdots&\ddots&\vdots\\ \mathrm{a}_{\mathrm{m}1}B&\cdots&\mathrm{a}_{\mathrm{mn}}B\end{array}\right.\right]

矩阵求导

实值函数相对于实向量的梯度

相对于n×1n×1向量xx的梯度算子记作x∇x,定义为:

x=[x1,x2,,xn]T=x\nabla_{\mathbf{x}}=\left[\frac\partial{\partial x_1},\frac\partial{\partial x_2},\ldots,\frac\partial{\partial x_n}\right]^T=\frac\partial{\partial\mathbf{x}}

因此,n×1n×1实向量xx为变元的实标量函数f(x)f(\mathbf{x})相对于x的梯度为n×1n×1的列向量,定义为:

xf(x)=[f(x)x1,f(x)x2,,f(x)xn]T=f(x)x\nabla_\mathbf{x}f(\mathbf{x})=\left[\frac{\partial f(\mathbf{x})}{\partial x_1},\frac{\partial f(\mathbf{x})}{\partial x_2},\ldots,\frac{\partial f(\mathbf{x})}{\partial x_n}\right]^T=\frac{\partial f(\mathbf{x})}{\partial\mathbf{x}}

梯度方向的负方向成为变元x\mathbf{x}的梯度流(gradient flow),记为:

x˙=xf(x)\dot{\mathbf{x}}=-\nabla_{{\mathbf{x}}}f(\mathbf{x})

从梯度的定义式可以看出:

  1. 一个以向量为变元的变量函数的梯度为一向量。
  2. 梯度的每个分量给出了变量函数在该分量方向上的变化率

梯度向量最重要的性质之一是,它指出了当变元增大时函数ff的最大增大率。相反,梯度的负值(负梯度)指出了当变元增大时函数ff的最大减小率。根据这样一种性质,即可设计出求函数极小值的迭代算法。

类似地,实值函数f(x)f(\mathbf{x})相对于1×n1×n行向量xTx^T的梯度为1×n1×n行向量,定义为:

xTf(x)=[f(x)x1,f(x)x2,,f(x)xn]=f(x)xT\nabla_{\mathbf{x}^T}f(\mathbf{x})=\left[\frac{\partial f(\mathbf{x})}{\partial x_1},\frac{\partial f(\mathbf{x})}{\partial x_2},\ldots,\frac{\partial f(\mathbf{x})}{\partial x_n}\right]=\frac{\partial f(\mathbf{x})}{\partial\mathbf{x}^T}

mm维行向量函数f(x)=[f1(x),,fm(x)]\mathbf{f}(\mathbf{x})=[f_1(\mathbf{x}),\ldots,f_m(\mathbf{x})],相对于nn维实向量x\mathbf{x}的梯度为n×mn×m矩阵定义为:

f(x)x=[f1(x)x1f2(x)x1fm(x)x1f1(x)x2f2(x)x2fm(x)x2f1(x)xnf2(x)xnfm(x)xn]=xf(x)\begin{aligned}\frac{\partial\mathbf{f}(\mathbf{x})}{\partial\mathbf{x}}&=\begin{bmatrix}\frac{\partial f_1(\mathbf{x})}{\partial x_1}&\frac{\partial f_2(\mathbf{x})}{\partial x_1}&\ldots&\frac{\partial f_m(\mathbf{x})}{\partial x_1}\\ \frac{\partial f_1(\mathbf{x})}{\partial x_2}&\frac{\partial f_2(\mathbf{x})}{\partial x_2}&\ldots&\frac{\partial f_m(\mathbf{x})}{\partial x_2}\\ \vdots&\vdots&\ldots&\vdots\\ \frac{\partial f_1(\mathbf{x})}{\partial x_n}&\frac{\partial f_2(\mathbf{x})}{\partial x_n}&\ldots&\frac{\partial f_m(\mathbf{x})}{\partial x_n}\end{bmatrix}=\nabla_{\mathbf{x}}\mathbf{f}(\mathbf{x})\end{aligned}

m×1m×1向量函数f(x)=y=[y1,,ym]T\mathbf{f}(\mathbf{x})=\mathbf{y}=[y_1,\ldots,y_m]^T,其中y1,y2,,ymy_1,y_2,\ldots,y_m是向量的标量函数,一阶梯度:

yxT=[y1x1y1x2y1xny2x1y2x2y2xnymx1ymx2ymxn]\frac{\partial\mathbf{y}}{\partial\mathbf{x}^T}=\begin{bmatrix}\frac{\partial y_1}{\partial x_1}&\frac{\partial y_1}{\partial x_2}&\cdots&\frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1}&\frac{\partial y_2}{\partial x_2}&\cdots&\frac{\partial y_2}{\partial x_n}\\ \vdots&\vdots&\cdots&\vdots\\ \frac{\partial y_m}{\partial x_1}&\frac{\partial y_m}{\partial x_2}&\cdots&\frac{\partial y_m}{\partial x_n}\end{bmatrix}

yxT\frac{\partial\mathbf{y}}{\partial\mathbf{x}^T}是一个m×nm×n的矩阵,称为向量函数y=[y1,y2,,ym]T\mathbf{y}=[y_1,y_2,\ldots,y_m]^T的 Jacobi 矩阵。

f(x)=[x1,x2,,xn]\mathbf{f}(\mathbf{x}) = [x_{1},x_{2},\ldots ,x_{n}],则:

xTx=I\begin{equation}\frac{\partial \mathbf{x}^{T}}{\partial \mathbf{x}} = \mathbf{I} \end{equation}

这个结论非常重要,将帮助我们导出更多有用的结论。

A\mathbf{A}y\mathbf{y}均和x\mathbf{x}无关,则:

xTAyx=xTxAy=Ay\begin{equation}\frac{\partial \mathbf{x}^{T}\mathbf{A}\mathbf{y}}{\partial \mathbf{x}} = \frac{\partial \mathbf{x}^{T}}{\partial \mathbf{x}}\mathbf{A}\mathbf{y} = \mathbf{A}\mathbf{y} \end{equation}

因为yTAx=ATy,x=x,ATy=xTATy\mathbf{y}^{T}\mathbf{A}\mathbf{x} = \langle \mathbf{A}^{T}\mathbf{y},\mathbf{x} \rangle = \langle \mathbf{x},\mathbf{A}^{T}\mathbf{y} \rangle = \mathbf{x}^{T}\mathbf{A}^{T} \mathbf{y},则:

yTAxx=ATy\begin{equation}\frac{\partial \mathbf{y}^{T}\mathbf{A}\mathbf{x}}{\partial \mathbf{x}} = \mathbf{A}^{T}\mathbf{y} \end{equation}

由于:

xTAx=i=1nj=1nAijxixj\begin{equation}x^{T}\mathbf{A}\mathbf{x} = \sum_{i=1}^{n}\sum_{j=1}^{n}A_{ij}x_{i}x_{j} \end{equation}

所以梯度xTAxx\frac{\partial\mathbf{x}^{T}\mathbf{A}\mathbf{x}}{\partial \mathbf{x}}的第k个分量为:

[xTAxx]k=xki=1nj=1nAijxixj=i=1nAikxi+j=1nAkjxj\begin{equation}\bigg[ \frac{\partial \mathbf{x}^{T}\mathbf{A}\mathbf{x}}{\partial \mathbf{x}} \bigg]_{k} = \frac{\partial}{\partial x_{k}} \sum_{i=1}^{n}\sum_{j=1}^{n}A_{ij}x_{i}x_{j} = \sum_{i=1}^{n}A_{ik}x_{i} + \sum_{j=1}^{n}A_{kj}x_{j} \end{equation}

即有:

xTAxx=Ax+ATx\begin{equation}\frac{\partial \mathbf{x}^{T}\mathbf{A}\mathbf{x}}{\partial \mathbf{x}} = \mathbf{A}\mathbf{x} + \mathbf{A}^{T}\mathbf{x} \end{equation}

特别的如果A\mathbf{A}为对称矩阵则有:

xTAxx=2Ax\begin{equation}\frac{\partial \mathbf{x}^{T}\mathbf{A}\mathbf{x}}{\partial \mathbf{x}} = 2\mathbf{A}\mathbf{x} \end{equation}

归纳以上三个例子的结果以及其他结果,便得到实值函数f(x)f(\mathbf{x})相对于列向量x\mathbf{x}的一下几个常用的梯度公式。

f(x)=cf(\mathbf{x}) = c为常数,则梯度cx=0\frac{\partial c}{\partial \mathbf{x}} = 0

线性法则:若f(x)f(\mathbf{x})g(x)g(\mathbf{x})分别是向量x\mathbf{x}的实值函数,c1c_{1}c2c_{2}为实常数,则:

[c1f(x)+c2g(x)]x=c1f(x)x+c2g(x)x\begin{equation}\frac{\partial[c_{1}f(\mathbf{x}) + c_{2}g(\mathbf{x})]}{\partial \mathbf{x}} = c_{1}\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}} + c_{2}\frac{\partial g(\mathbf{x})}{\partial \mathbf{x}} \end{equation}

乘法法则:若f(x)f(\mathbf{x})g(x)g(\mathbf{x})都是向量x\mathbf{x}的实值函数,则:

f(x)g(x)x=g(x)f(x)x+f(x)g(x)x\begin{equation}\frac{f(\mathbf{x})g(\mathbf{x})}{\partial \mathbf{x}} = g(\mathbf{x})\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}} + f(\mathbf{x}) \frac{\partial g(\mathbf{x})}{\partial \mathbf{x}} \end{equation}

商法则:若g(x)0g(\mathbf{x})\neq 0,则:

f(x)/g(x)x=1g2(x)[g(x)f(x)xf(x)g(x)x]\begin{equation}\frac{\partial f(\mathbf{x})/g(\mathbf{x})}{\partial \mathbf{x}} = \frac{1}{g^{2}(\mathbf{x})}\bigg[ g(\mathbf{x})\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}} - f(\mathbf{x}) \frac{\partial g(\mathbf{x})}{\partial \mathbf{x}} \bigg] \end{equation}

链式法则:若y(x)\mathbf{y}(\mathbf{x})x\mathbf{x}的向量值函数,则:

f(y(x))x=yT(x)xf(y)y\begin{equation}\frac{\partial f(\mathbf{y}(\mathbf{x}))}{\partial \mathbf{x}} = \frac{\partial \mathbf{y}^{T}(\mathbf{x})}{\partial \mathbf{x}}\frac{\partial f(\mathbf{y})}{\partial \mathbf{y}} \end{equation}

式中yT(x)x\frac{ \partial\mathbf{y}^{T}(\mathbf{x})}{\partial \mathbf{x}}n×nn\times n矩阵。

例子

n×1n\times 1向量a\mathbf{a}x\mathbf{x}是无关的常数向量,则:

aTxx=axTax=a\begin{equation}\frac{\partial \mathbf{a}^{T}\mathbf{x}}{\partial \mathbf{x}} = \mathbf{a} \qquad \frac{\partial\mathbf{x}^{T}\mathbf{a}}{\partial \mathbf{x}} = \mathbf{a} \end{equation}

n×1n\times 1向量a\mathbf{a}x\mathbf{x}是无关的常数向量,则:

aTy(x)x=yT(x)xayT(x)ax=yT(x)xa\begin{equation}\frac{\partial\mathbf{a}^{T}\mathbf{y}(\mathbf{x})}{\partial \mathbf{x}} = \frac{\partial \mathbf{y}^{T}(\mathbf{x})}{\partial \mathbf{x}} \mathbf{a} \qquad \frac{\partial\mathbf{y}^{T}(\mathbf{x})\mathbf{a}}{\partial \mathbf{x}} =\frac{\partial\mathbf{y}^{T}(\mathbf{x})}{\partial \mathbf{x}} \mathbf{a} \end{equation}

A\mathbf{A}y\mathbf{y}均与x\mathbf{x}无关,则:

xTAyx=AyyTAxx=ATy\begin{equation}\frac{\partial \mathbf{x}^{T}\mathbf{A}\mathbf{y}}{\partial \mathbf{x}} = \mathbf{A}\mathbf{y} \qquad \frac{\partial \mathbf{y}^{T}\mathbf{A}\mathbf{x}}{\partial \mathbf{x}} = \mathbf{A}^{T}\mathbf{y} \end{equation}

A\mathbf{A}是与x\mathbf{x}无关,而y(x)\mathbf{y}(\mathbf{x})与向量x\mathbf{x}的元素有关,则:

[y(x)]TAy(x)x=[y(x)]Tx(A+AT)y(x)\begin{equation}\frac{\partial[\mathbf{y}(\mathbf{x})]^{T} \mathbf{A}\mathbf{y}(\mathbf{x})}{\partial \mathbf{x}} = \frac{\partial[\mathbf{y}(\mathbf{x})]^{T}}{\partial \mathbf{x}}(\mathbf{A} + \mathbf{A}^{T})\mathbf{y}(\mathbf{x}) \end{equation}

A\mathbf{A}是一个与向量x\mathbf{x}无关的矩阵,而y(x)\mathbf{y}(\mathbf{x})z(x)\mathbf{z}(\mathbf{x})是与向量x\mathbf{x}的元素有关的列向量,则:

[y(x)]TAz(x)x=[y(x)]TxAz(x)+[z(x)]TxATy(x)\begin{equation}\frac{[\mathbf{y}(\mathbf{x})]^{T} \mathbf{A}\mathbf{z}(\mathbf{x})}{\partial \mathbf{x}} = \frac{[\mathbf{y}(\mathbf{x})]^{T}}{\partial \mathbf{x}} \mathbf{A}\mathbf{z}(\mathbf{x}) + \frac{[\mathbf{z}(\mathbf{x})]^{T}}{\partial \mathbf{x}}\mathbf{A}^{T}\mathbf{y}(\mathbf{x}) \end{equation}

x\mathbf{x}n×1n\times 1向量,a\mathbf{a}m×1m\times 1常数向量,A\mathbf{A}B\mathbf{B}分别为m×nm\times nm×mm\times m常数矩阵,且B\mathbf{B}为对称矩阵,则:

(aAx)TB(aAx)x=2ATB(aAx)\begin{equation}\frac{\partial (\mathbf{a} - \mathbf{A} \mathbf{x})^{T}\mathbf{B}(\mathbf{a} - \mathbf{A}\mathbf{x})}{\partial \mathbf{x}} = -2\mathbf{A}^{T}\mathbf{B}(\mathbf{a} - \mathbf{A}\mathbf{x}) \end{equation}

实值函数的梯度矩阵

在最优化问题中,需要最优化的对象可能是某个加权矩阵。因此,有必要分析实值函数相对于矩阵变元的梯度。

实值函数f(A)f(\mathbf{A})相对于m×nm\times n是矩阵A\mathbf{A}的梯度为m×nm\times n矩阵,简称梯度矩阵,定义为:

f(A)A=[f(A)A11f(A)A12f(A)A1nf(A)A21f(A)A22f(A)A2nf(A)Am1f(A)Am2f(A)Amn]\begin{equation}\frac{\partial f(\mathbf{A})}{\partial \mathbf{A}} = \begin{bmatrix} \frac{\partial f(\mathbf{A})}{\partial A_{11}} & \frac{\partial f(\mathbf{A})}{\partial A_{12}} & \ldots \frac{\partial f(\mathbf{A})}{\partial A_{1n}} \\ \frac{\partial f(\mathbf{A})}{\partial A_{21}} & \frac{\partial f(\mathbf{A})}{\partial A_{22}} & \ldots \frac{\partial f(\mathbf{A})}{\partial A_{2n}} \\ \vdots & \vdots & \ldots & \vdots \\ \frac{\partial f(\mathbf{A})}{\partial A_{m1}} & \frac{\partial f(\mathbf{A})}{\partial A_{m2}} & \ldots \frac{\partial f(\mathbf{A})}{\partial A_{mn}} \end{bmatrix} \end{equation}

式中AijA_{ij}A\mathbf{A}的元素。

实值函数相对于矩阵变元的梯度具有以下性质:

f(A)=cf(\mathbf{A}) = c是常数,其中A\mathbf{A}m×nm\times n矩阵,则梯度cA=Om×n\frac{\partial c}{\partial \mathbf{A}} = \mathbf{O}_{m\times n}

线性法则:若f(A)f(\mathbf{A})g(A)g(\mathbf{A})分别是矩阵A\mathbf{A}的实值函数,c1c_{1},c2c_{2}均为实常数,则:

[c1f(A)+c2g(A)]A=c1f(A)A+c2g(A)A\begin{equation}\frac{\partial [c_{1}f(\mathbf{A}) + c_{2}g(\mathbf{A})]}{\partial \mathbf{A}} = c_{1}\frac{\partial f(\mathbf{A})}{\partial \mathbf{A}} + c_{2}\frac{\partial g(\mathbf{A})}{\partial \mathbf{A}} \end{equation}

乘积法则:若f(A)f(\mathbf{A})g(A)g(\mathbf{A})都是矩阵A\mathbf{A}的实值函数,则:

f(A)g(A)A=f(A)g(A)A+g(A)f(A)A\begin{equation}\frac{\partial f(\mathbf{A})g(\mathbf{A})}{\partial \mathbf{A}} = f(\mathbf{A})\frac{\partial g(\mathbf{A})}{\partial \mathbf{A}} + g(\mathbf{A}) \frac{\partial f(\mathbf{A})}{\partial \mathbf{A}} \end{equation}

商法则:若g(A)0g(\mathbf{A})\neq 0,则:

f(A)/g(A)A=1[g(A)]2[g(A)f(A)Af(A)g(A)A]\begin{equation}\frac{\partial f(\mathbf{A})/g(\mathbf{A})}{\partial \mathbf{A}} = \frac{1}{[g(\mathbf{A})]^{2}} \bigg[ g(\mathbf{A}) \frac{\partial f(\mathbf{A})}{\partial \mathbf{A}} - f(\mathbf{A}) \frac{\partial g(\mathbf{A})}{\partial \mathbf{A}} \bigg] \end{equation}

链式法则:令A\mathbf{A}m×nm\times n的矩阵,且y=f(A)y=f(\mathbf{A})g(y)g(y)分别是以矩阵A\mathbf{A}和标量yy为变元的实值函数,则:

g(f(A))A=dg(y)dyf(A)A\begin{equation}\frac{\partial g(f(\mathbf{A}))}{\partial \mathbf{A}} = \frac{\mathrm{d}g(y)}{\mathrm{d} y}\frac{\partial f(\mathbf{A})}{\partial \mathbf{A}} \end{equation}

例子

ARm×n\mathbf{A}\in R^{m\times n},xRm×1\mathbf{x}\in R^{m\times 1},yRn×1\mathbf{y}\in R^{n\times 1},则:

xTAyA=xyT\begin{equation}\frac{\partial \mathbf{x}^{T}\mathbf{A}\mathbf{y}}{\partial \mathbf{A}} = \mathbf{x}\mathbf{y}^{T} \end{equation}

ARn×n\mathbf{A}\in R^{n\times n}非奇异,xRn×1\mathbf{x}\in R^{n\times 1},yRn×1\mathbf{y}\in R^{n\times 1},则:

xTA1yA=ATxyTAT\begin{equation}\frac{\partial \mathbf{x}^{T} \mathbf{A}^{-1}\mathbf{y}}{\partial \mathbf{A}} = -\mathbf{A}^{-T}\mathbf{x}\mathbf{y}^{T}\mathbf{A}^{-T} \end{equation}

ARm×n\mathbf{A}\in R^{m\times n},xRn×1\mathbf{x}\in R^{n\times 1},yRn×1\mathbf{y}\in R^{n\times 1},则:

xTATAyA=A(xyT+yxT)\begin{equation}\frac{\partial \mathbf{x}^{T} \mathbf{A}^{T}\mathbf{A}\mathbf{y}}{\partial \mathbf{A}} = \mathbf{A}(\mathbf{x}\mathbf{y}^{T} + \mathbf{y}\mathbf{x}^{T}) \end{equation}

ARm×n\mathbf{A}\in R^{m\times n},x\mathbf{x},yRm×1\mathbf{y}\in R^{m\times 1},则:

xTAATyA=(xyT+yxT)A\begin{equation}\frac{\partial \mathbf{x}^{T}\mathbf{A}\mathbf{A}^{T}\mathbf{y}}{\partial \mathbf{A}} = (\mathbf{x}\mathbf{y}^{T} + \mathbf{y}\mathbf{x}^{T})\mathbf{A} \end{equation}

指数函数的梯度:

exp(xTAy)A=xyTexp(xTAy)\begin{equation}\frac{\partial \exp(\mathbf{x}^{T}\mathbf{A}\mathbf{y})}{\partial \mathbf{A}} = \mathbf{x}\mathbf{y}^{T} \exp(\mathbf{x}^{T}\mathbf{A}\mathbf{y}) \end{equation}

迹函数的梯度矩阵

有时候,二次型目标函数可以利用矩阵的迹加以重写。因为一标量可以视为1×11\times 1的矩阵,所以二次型目标函数的迹直接等同于函数本身,即f(x)=xTAx=tr(xTAx)f(\mathbf{x}) = \mathbf{x}^{T}\mathbf{A}\mathbf{x} = \mathrm{tr}(\mathbf{x}^{T}\mathbf{A}\mathbf{x}) 利用迹的性质,又可以将目标函数进一步表示为:

f(x)=xTAx=tr(xTAx)=tr(AxxT)\begin{equation}f(\mathbf{x}) = \mathbf{x}^{T}\mathbf{A}\mathbf{x} = \mathrm{tr}(\mathbf{x}^{T}\mathbf{A}\mathbf{x}) = \mathrm{tr}(\mathbf{A}\mathbf{x}\mathbf{x}^{T}) \end{equation}

因此,二次型目标函数xTAx\mathbf{x}^{T}\mathbf{A}\mathbf{x}等于核矩阵A\mathbf{A}和向量外积xxT\mathbf{x}\mathbf{x}^{T}的乘积的迹

tr(AxxT)\mathrm{tr}(\mathbf{A}\mathbf{x}\mathbf{x}^{T})

对于n×nn\times n矩阵A\mathbf{A},由于tr(A)=i=1nAii\mathrm{tr}(\mathbf{A}) = \sum_{i=1}^{n}A_{ii},故梯度tr(A)A\frac{\partial \mathrm{tr}(\mathbf{A})}{\partial \mathbf{A}}(i,j)(i,j)元素为:

[tr(A)A]ij=Aijk=1nAkk={1j=i0ji\begin{equation}\bigg[\frac{\partial \mathrm{tr}(\mathbf{A})}{\partial \mathbf{A}} \bigg]_{ij} = \frac{\partial}{\partial A_{ij}}\sum_{k=1}^{n}A_{kk} = \begin{cases} 1 & j=i \\ 0 & j\neq i \end{cases} \end{equation}

所以有tr(A)A=I\frac{\partial \mathrm{tr}(\mathbf{A})}{\partial \mathbf{A}} = \mathbf{I}

考察目标函数f(A)=tr(AB)f(\mathbf{A}) = \mathrm{tr}(\mathbf{A}\mathbf{B}),其中A\mathbf{A}B\mathbf{B}分别为m×nm\times nmn×mmn\times m实矩阵。首先,矩阵乘积的元素为[AB]ij=l=1nAilBlj[\mathbf{A}\mathbf{B}]_{ij} = \sum_{l=1}^{n}A_{il}B_{lj},故矩阵乘积的迹tr(AB)=p=1ml=1nAplBlp\mathrm{tr}(\mathbf{A}\mathbf{B}) = \sum_{p=1}^{m}\sum_{l=1}^{n}A_{pl}B_{lp},于是,梯度tr(AB)A\frac{\partial \mathrm{tr}(\mathbf{A}\mathbf{B})}{\partial \mathbf{A}}是一个m×nm\times n矩阵,其元素为:

[tr(AB)A]ij=Aij(p=1ml=1nAplBlp)=Bji\begin{equation}\bigg[ \frac{\partial \mathrm{tr}(\mathbf{A}\mathbf{B})}{\partial \mathbf{A}} \bigg]_{ij} = \frac{\partial }{\partial A_{ij}} \bigg(\sum_{p=1}^{m}\sum_{l=1}^{n}A_{pl}B_{lp} \bigg) = B_{ji} \end{equation}

所以有:

tr(AB)A=Atr(AB)=BT\begin{equation}\frac{\partial \mathrm{tr}(\mathbf{A}\mathrm{B})}{\partial \mathbf{A}} = \nabla_{\mathbf{A}} \mathrm{tr}(\mathbf{A}\mathrm{B}) = \mathbf{B}^{T} \end{equation}

由于tr(BA)=tr(AB)\mathrm{tr}(\mathbf{B}\mathbf{A}) = \mathrm{tr}(\mathbf{A}\mathbf{B})所以:

tr(AB)A=tr(BA)A=BT\begin{equation}\frac{ \partial \mathrm{tr}(\mathbf{A}\mathrm{B}) }{\partial \mathbf{A}} = \frac{\partial \mathrm{tr}(\mathbf{B}\mathbf{A})}{\partial \mathbf{A}} = \mathbf{B}^{T} \end{equation}

同理,由于tr(xyT)=tr(yxT)=xTy\mathrm{tr}(\mathbf{x}\mathbf{y}^{T}) = \mathrm{tr}(\mathbf{y}\mathbf{x}^{T}) = \mathbf{x}^{T}\mathbf{y},所以有:

tr(xyT)x=tr(yxT)x=y\begin{equation}\frac{\partial \mathrm{tr}(\mathbf{x}\mathbf{y}^{T})}{\partial \mathbf{x}} = \frac{\partial \mathrm{tr}(\mathbf{y}\mathbf{x}^{T})}{\partial \mathbf{x}} = \mathbf{y} \end{equation}

Hessian 矩阵

实值函数f(x)f(\mathbf{x})相对于m×1m\times 1实向量x\mathbf{x}的二阶偏导是一个由m2m^{2}个二阶偏导组成的矩阵,称为 Hessian 矩阵,定义为:

2f(x)xxT=xT[f(x)x]\begin{equation}\frac{\partial^{2} f(\mathbf{x})}{\partial \mathbf{x} \partial \mathbf{x}^{T}} = \frac{\partial}{\partial \mathbf{x}^{T}} \bigg[\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}} \bigg] \end{equation}

或者简写为梯度的梯度:

x2f(x)=x(xf(x))\begin{equation}\nabla_{\mathbf{x}}^{2}f(\mathbf{x}) = \nabla_{\mathbf{x}} (\nabla_{\mathbf{x}} f(\mathbf{x})) \end{equation}

根据定义,Hessian 矩阵的第j列是梯度f(x)x=xf(x)\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}} = \nabla_{\mathbf{x}} f(\mathbf{x})jj个分量的梯度,即:

[2f(x)xxT]i,j=2f(x)xixj\begin{equation}\bigg[ \frac{\partial^{2}f(\mathbf{x}) }{\partial \mathbf{x} \partial \mathbf{x}^{T}} \bigg]_{i,j} = \frac{\partial^{2}f(\mathbf{x})}{\partial x_{i} \partial x_{j}} \end{equation}

或者可以写作:

2f(x)xxT=[2fx1x12fx1x22fx1xn2fx2x12fx2x22fx2xn2fxnx12fxnx22fxnxn]\begin{equation}\frac{\partial^{2} f(\mathbf{x})}{\partial \mathbf{x} \partial \mathbf{x}^{T}} = \begin{bmatrix} \frac{\partial^{2}f}{\partial x_{1}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{1}\partial x_{2}} & \ldots & \frac{\partial^{2}f}{\partial x_{1}\partial x_{n}} \\ \frac{\partial^{2}f}{\partial x_{2}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{2}\partial x_{2}} & \ldots & \frac{\partial^{2}f}{\partial x_{2}\partial x_{n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^{2}f}{\partial x_{n}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{n}\partial x_{2}} & \ldots & \frac{\partial^{2}f}{\partial x_{n}\partial x_{n}} \\ \end{bmatrix} \end{equation}

因此,Hessian 矩阵可以通过两个步骤计算得出:

  1. 求实值函数f(x)f(\mathbf{x})关于向量变元x\mathbf{x}的偏导数,得到实值函数的梯度f(x)x\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}
  2. 再求梯度f(x)x\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}相对于1×n1\times n行向量xT\mathbf{x}^{T}的偏导数,得到梯度的梯度即 Hessian 矩阵

根据以上步骤,得到 Hessian 矩阵的下列公式。

对于n×1n\times 1的常数向量aT\mathbf{a}^{T},有:

2aTxxxT=On×n\begin{equation}\frac{\partial^{2} \mathbf{a}^{T}\mathbf{x}}{\partial \mathbf{x}\partial \mathbf{x}^{T}} = \mathbf{O}_{n\times n} \end{equation}

A\mathbf{A}n×nn\times n矩阵,则:

2xTAxxxT=A+AT\begin{equation}\frac{\partial^{2} \mathbf{x}^{T}\mathbf{A}\mathbf{x}}{\partial \mathbf{x}\partial \mathbf{x}^{T}} = \mathbf{A} + \mathbf{A}^{T} \end{equation}

x\mathbf{x}n×1n\times 1向量,a\mathbf{a}m×1m\times 1常数向量,A\mathbf{A}B\mathbf{B}分别为m×nm\times nm×mm\times m常数矩阵,且B\mathbf{B}为对称矩阵,则:

2(aAx)TB(aAx)xxT=2ATBA\begin{equation}\frac{\partial^{2}(\mathbf{a} - \mathbf{A}\mathbf{x})^{T}\mathbf{B} (\mathbf{a} - \mathbf{A}\mathbf{x}) }{\partial \mathbf{x} \partial \mathbf{x}^{T}} = 2\mathbf{A}^{T}\mathbf{B}\mathbf{A} \end{equation}

利用全微分求导

矩阵的迹 tr(A)与一阶实矩阵微分dX

A = \left[ {\left. {\begin{array}{*{20}{c}} {{a_{11}}}&{{a_{12}}}&{...}&{{a_{1n}}} \\ {{a_{21}}}&{{a_{22}}}&{...}&{{a_{2n}}} \\ {...}&{...}&{...}&{...} \\ {{a_{n1}}}&{{a_{n2}}}&{...}&{{a_{nn}}} \end{array}} \right]} \right.

矩阵的迹:tr(A)=a11+a22++an=i=1naiitr(A) = {a_{11}} + {a_{22}} + \cdot \cdot \cdot + {a_{n}} = \sum\limits_{i = 1}^n {{a_{ii}}}

只有方阵才有迹

交换律:tr(AB)=tr(BA),Am×n,Bn×mtr(AB) = tr(BA),{A_{m \times n}},{B_{n \times m}}

矩阵变元实值标量函数全微分df(X)=tr(f(X)XTdX)df(X) = tr(\frac{{\partial f(X)}}{{\partial {X^T}}}dX)

矩阵变元或向量变元的实值标量函数的矩阵求导的结果,都可以通过上式求解

使用矩阵微分求导:

对于实值标量函数f(X),tr(f(X))=f(X),df(X)=tr(df(X))f(X),tr(f(X))=f(X),df(X)=tr(df(X)),所以有df(X)=tr(df(X))=d(trf(X))df(X)=tr(df(X))=d(trf(X))

如果实值标量函数本身就是某个矩阵函数Fp×p(X){F_{p \times p}}(X)的迹,如reF(X)reF(X),则由全微分的线性法则得:

d(trFp×p(X))=d(i=1pfii(X))=i=1pd(fii(X))=tr(dFp×p(X))d(tr{F_{p \times p}}(X)) = d(\sum\limits_{i = 1}^p {{f_{ii}}(X)} ) = \sum\limits_{i = 1}^p {d({f_{ii}}(X)) = tr(d{F_{p \times p}}(X))}

常见的求导

  • (xTa)x=(aTx)x=a\frac{{\partial ({x^T}a)}}{{\partial x}} = \frac{{\partial ({a^T}x)}}{{\partial x}} = a
  • (xTx)x=2x\frac{{\partial ({x^T}x)}}{{\partial x}} = 2x
  • (xTAx)x=Ax+ATx,An×n=(aij)i=1,j=1n,n\frac{{\partial ({x^T}Ax)}}{{\partial x}} = Ax + {A^T}x,{A_{n \times n}} = ({a_{ij}})_{i = 1,j = 1}^{n,n}
  • (aTxxTb)x=abTx+baTx,a=(a1,a2,...,an)T,b=(b1,b2,...,bn)T\frac{{\partial ({a^T}x{x^T}b)}}{{\partial x}} = a{b^T}x + b{a^T}x,a = {({a_1},{a_2},...,{a_n})^T},b = {({b_1},{b_2},...,{b_n})^T}
  • (aTxb)x=abT,am×1,bn×1,xm×n\frac{{\partial ({a^T}xb)}}{{\partial x}} = a{b^T},{a_{m \times 1}},{b_{n \times 1}},{x_{m \times n}}
  • (aTxTb)x=baT,am×1,bn×1,xm×n\frac{{\partial ({a^T}{x^T}b)}}{{\partial x}} = b{a^T},{a_{m \times 1}},{b_{n \times 1}},{x_{m \times n}}
  • (aTxxTb)x=abTx+baTx,am×1,bm×1,xm×m\frac{{\partial ({a^T}x{x^T}b)}}{{\partial x}} = a{b^T}x + b{a^T}x,{a_{m \times 1}},{b_{m \times 1}},{x_{m \times m}}
  • (aTxTxb)x=xabT+xbaT,am×1,bm×1,xm×m\frac{{\partial ({a^T}{x^T}xb)}}{{\partial x}} = xa{b^T} + xb{a^T},{a_{m \times 1}},{b_{m \times 1}},{x_{m \times m}}

常用的结论:

证明:dX=Xtr(X1dX)d\left| X \right| = \left| X \right|tr({X^{ - 1}}dX)

\begin{gathered} \left| X \right| = {x_{i1}}{A_{i1}} + {x_{i2}}{A_{i2}} + ... + {x_{in}}{A_{in}} \hfill \\ \frac{{\partial \left| X \right|}}{{\partial {x_{ij}}}} = {A_{ij}} \hfill \\ \frac{{\partial \left| X \right|}}{{\partial {X^T}}} = \left[ {\left. {\begin{array}{*{20}{c}} {{A_{11}}}&{{A_{21}}}&{...}&{{A_{n1}}} \\ {{A_{12}}}&{{A_{22}}}&{...}&{{A_{n2}}} \\ {...}&{...}&{...}&{...} \\ {{A_{1n}}}&{{A_{2n}}}&{...}&{{A_{nn}}} \end{array}} \right]} \right. = {X^*} = \left| X \right|{X^{ - 1}} \hfill \\ d\left| X \right| = tr(\frac{{\partial \left| X \right|}}{{\partial {X^T}}}dX) = tr(\left| X \right|{X^{ - 1}}dX) \hfill \\ \end{gathered}

d(X1)=X1dX(X1)d({X^{ - 1}}) = - {X^{ - 1}}dX({X^{ - 1}})

令A为在不考虑矩阵变元X是对称矩阵的前提下,得到的 Jacobian 矩阵

A = {\left[ {\left. {\matrix{ {{{\partial f} \over {\partial {x_{11}}}}} & {{{\partial f} \over {\partial {x_{21}}}}} & {...} & {{{\partial f} \over {\partial {x_{n1}}}}} \cr {{{\partial f} \over {\partial {x_{12}}}}} & {{{\partial f} \over {\partial {x_{22}}}}} & {...} & {{{\partial f} \over {\partial {x_{n2}}}}} \cr {...} & {...} & {...} & {...} \cr {{{\partial f} \over {\partial {x_{1n}}}}} & {{{\partial f} \over {\partial {x_{2n}}}}} & {...} & {{{\partial f} \over {\partial {x_{n}}}}} \cr } } \right]} \right._{n \times n}}

对称矩阵变元的实值标量函数的求导公式 f(X)Xn×n=f(X)Xn×nT=AT+A(AE)\frac{{\partial f(X)}}{{\partial {X_{n \times n}}}} = \frac{{\partial f(X)}}{{\partial X_{n \times n}^T}} = {A^T} + A - (A \circ E)

xNp(μ,),>0x \sim {N_p}(\mu ,\sum ),\sum > 0\sum正定的协方差矩阵,则xx的概率密度函数为

f(x)=1(2π)p212e12(xμ)T1(xμ)f(\mathbf{x}) = \frac{1}{{{{(2\pi )}^{\frac{p}{2}}}{{\left| \sum \right|}^{\frac{1}{2}}}}}{e^{ - \frac{1}{2}{{(\mathbf{x} - \mu )}^T}{\sum ^{ - 1}}(\mathbf{x} - \mu )}}

对数似然函数:lnL(μ,)=ln(i=1nf(xi))=p2nln(2π)12nln12i=1n[(xiμ)T1(xiμ)]\ln L(\mu ,\sum ) = \ln (\prod\limits_{i = 1}^n {f({x_i})}) = - \frac{p}{2}n\ln (2\pi ) - \frac{1}{2}n\ln \left| \sum \right| - \frac{1}{2}\sum\limits_{i = 1}^n {[{{({x_i} - \mu )}^T}{\sum ^{ - 1}}({x_i} - \mu )]} 求导:(lnL(μ,))μ=1i=1n(xiμ)\frac{{\partial (\ln L(\mu ,\sum ))}}{{\partial \mu }} = {\sum ^{ - 1}}\sum\limits_{i = 1}^n {({x_i} - \mu )}

(lnL(μ,))=1(i=1n[(xiμ)(xiμ)T])1n1{[12(1(i=1n[(xiμ)(xiμ)T])1n1]E}\frac{{\partial (\ln L(\mu ,\sum ))}}{{\partial \sum }} = {\sum ^{ - 1}}(\sum\limits_{i = 1}^n {[({x_i} - \mu ){{({x_i} - \mu )}^T}]} ){\sum ^{ - 1}} - n{\sum ^{ - 1}} - \{ [\frac{1}{2}({\sum ^{ - 1}}(\sum\limits_{i = 1}^n {[({x_i} - \mu ){{({x_i} - \mu )}^T}]} ){\sum ^{ - 1}} - n{\sum ^{ - 1}}] \circ E\}

令导数为零,得:

μ=x=1ni=1nxi\hfill=1ni=1n[(xix)(xix)T]\hfill\begin{gathered} \mu = \overline x = \frac{1}{n}\sum\limits_{i = 1}^n {{x_i}} \hfill \\ \sum = \frac{1}{n}\sum\limits_{i = 1}^n {[({x_i} - } \overline x ){({x_i} - \overline x )^T}] \hfill \\ \end{gathered}

Hermitian 矩阵的特征值和特征向量

在信号处理领域,经常碰到对称矩阵。复对称矩阵又称为Hermitian矩阵。比如对于实观测数据x(t)x(t),其自相关矩阵R=E[x(t)xT(t)]R=E[x(t)x^T(t)]是实对称矩阵,而复观测信号的自相关矩阵是Hermitian矩阵。Hermitian在计算过程中有一系列重要特性,可以大大简化计算过程。本文总结Hermitian矩阵特征值和特征向量的一些性质。

重要性质

  1. 特征值的实数性 Hermitian 矩阵AA的特征值一定是实的。

    证明:令λ和u\mathbf{u}分别是Hermitian矩阵A的特征值和与之对应的特征向量,即Au=λuA\mathbf{u}=λ\mathbf{u}。两边同时左乘特征向量的共轭转置,得二次型标量值函数uTAu=λuTu\mathbf{u}^TA\mathbf{u}=λ\mathbf{u}^T\mathbf{u},对其两边取共轭转置,得到uTAu=λTuTu\mathbf{u}^TA\mathbf{u}=λ^T\mathbf{u}^T u。注意内积uTu\mathbf{u}^T\mathbf{u}总是实数,则有λλ也一定是实数。

  2. 可逆矩阵的特征对关系 令λ,uλ,\mathbf{u}是Hermitian矩阵AA的特征对。若AA可逆,则1/λ,u1/λ,\mathbf{u}是逆矩阵A1A^{-1}的特征对。 证明:因为Au=λuA\mathbf{u}=λ\mathbf{u},则对两边左乘A1A^{-1},则有u=λA1u\mathbf{u}=λA^{-1}\mathbf{u},所以有λ1u=A1uλ^{-1}\mathbf{u}=A^{-1}\mathbf{u}

特征向量求解步骤

对于n×nn\times n的Hermitian矩阵AA,若它所有不同的特征值λ1,λ2,,λnλ_1,λ_2,…,λ_n都通过求解特征方程获得。那么求解其特征向量可以通过以下两个步骤完成:

  1. 利用高斯消元法求解方程:

    (AλI)x=0(A-λI)x=0

    得到与每个已知λ对应的非零解xx

  2. 利用Gram-Schmidt正交化方法将xx正交化,得到相互正交,并且具有单位范数的特征向量。

λkλ_k是Hermitian矩阵AA的多重特征值,并且其多重度为 mkm_k,那么 rank(AλkI)=nmk\mathbf{rank}(A−λ_kI)=n−m_k,因此任何一个Hermitian矩阵都满足可对角化定理的充要条件。因此,有U1AU=ΣU^{−1}AU=Σ

重要定理

Hermitian矩阵的所有特征向量线性无关,并且相互正交。特征矩阵U=[u1,,un]\mathbf{U}=[\mathbf{u}_1,…,\mathbf{u}_n]是酉矩阵,满足U1=UT\mathbf{U}^{-1}=U^T

证明过程:

  1. 首先证明不同特征值对应的特征向量是相互正交的 令λ1λ2λ_1≠λ_2是Hermitian矩阵A对应的特征值,且其对应的特征向量分别是u1,u2\mathbf{u}_1,\mathbf{u}_2,则有:

    u2TAu1=λ1u2Tu1\mathbf{u}_2^T\mathbf{A}\mathbf{u}_1=\lambda_1\mathbf{u}_2^T\mathbf{u}_1 u1TAu2=λ2u1Tu2\mathbf{u}_1^T\mathbf{A}\mathbf{u}_2=\lambda_2\mathbf{u}_1^T\mathbf{u}_2
  2. 对前一个式子取共轭,则有:

    u1TAu2=λ1u1Tu2\mathbf{u}_1^T\mathbf{A}\mathbf{u}_2=\lambda_1\mathbf{u}_1^T\mathbf{u}_2

    因此有:λ1u1Tu2=λ2u1Tu2\lambda_1\mathbf{u}_1^T\mathbf{u}_2=\lambda_2\mathbf{u}_1^T\mathbf{u}_2,由于λ1λ2λ_1≠λ_2,所以 u1\mathbf{u}_1u2\mathbf{u}_2 正交。

更进一步

对于若n×nn×n矩阵A,若λkλ_k是Hermitian矩阵AA的多重特征值,并且其多重度为 mkm_k,那么 rank(AλkI)=nmk\mathbf{rank}(A−λ_kI)=n−m_k,并 且AλkIA−λ_kI是可逆的。于是,方程(AλkI)u=0(A−λ_kI)u=0的线性无关解。这些线性无关解是正交的。由于特征矩阵U的所有特征向量即线性无关,又相互正交,故U为酉矩阵,满足UUT=IUU^T=I,即UT=U1U^T=U^{−1}

矩阵表示形式

对于Hermitian矩阵有:

  1. 正交相似形式:

    UTAU=diag(λ1,λ2,,λn)\mathbf{U}^T\mathbf{A}\mathbf{U}=\mathbf{diag}(\lambda_1,\lambda_2,…,\lambda_n)
  2. 矩阵分解形式(正交相似下的范式):

    A=UΣUT\mathbf{A}=\mathbf{U}Σ\mathbf{U}^T
  3. 求和形式:

    A=i=1nλiuiuiT\mathbf{A} = \sum_{i=1}^{n}\mathbf{\lambda}_i\mathbf{u}_i\mathbf{u}_i^T

二次型表示

在最优化理论和信号处理中,二次型函数可表示为:

XTAx=i=1nλixTui2\mathbf{X}^T\mathbf{A}\mathbf{x}=\sum_{i=1}^n\lambda_i\left|{\mathbf{x}^T\mathbf{u}_i}\right|^2

逆矩阵表示

A1\mathbf{A}^{-1}的级数展开形式:

A1=i=1nλi1uiuiT\mathbf{A}^{-1}=\sum_{i=1}^n\mathbf{\lambda}_i^{-1}\mathbf{u}_i\mathbf{u}_i^T

因此若已知A\mathbf{A}的特征值分解,可以很容易求出A1\mathbf{A}^{-1}

定矩阵

给定一个Hermitian矩阵(即等于其共轭转置的复矩阵) M\mathbf{M} ,对于任意非零复列向量 zz,都有zHMzz^H\mathbf{M}z都为正,则 M\mathbf{M} 是正定的。负定矩阵和负半定矩阵的定义类似,非正半定且非负半定的矩阵有时称为不定矩阵。

定义

对于对称实矩阵M:

M positive-definite xM x>0 for all xRn{0}M\text{ positive-definite }\quad\Longleftrightarrow\quad\mathbf{x}^\top M\mathrm{~}\mathbf{x}>0\text{ for all }\mathbf{x}\in\mathbb{R}^n\setminus\{\mathbf{0}\} M positive semi-definitexM x0 for all xRnM\text{ positive semi-definite}\quad\Longleftrightarrow\quad\mathbf{x}^\top M\mathrm{~}\mathbf{x}\geq0\text{ for all }\mathbf{x}\in\mathbb{R}^n M negative-definite xM x<0 for all xRn{0}M\text{ negative-definite }\quad\Longleftrightarrow\quad\mathbf{x}^\top M\mathrm{~}\mathbf{x}<0\text{ for all }\mathbf{x}\in\mathbb{R}^n\setminus\{\mathbf{0}\} M negative semi-definite xM x0 for all xRnM\text{ negative semi-definite }\quad\Longleftrightarrow\quad\mathbf{x}^\top M\mathrm{~}\mathbf{x}\leq0\text{ for all }\mathbf{x}\in\mathbb{R}^n

同理对Hermitian矩阵M:

M positive-definitezM z>0 for all zCn{0}M\text{ positive-definite}\quad\Longleftrightarrow\quad\mathbf{z}^*M\textbf{ z}>0\text{ for all }\mathbf{z}\in\mathbb{C}^n\setminus\{\mathbf{0}\}

等等...

性质

矩阵M是正定的当且仅当它满足以下任一等效条件。

  • M 与具有正实数项的对角矩阵一致
  • M 是对称的或 Hermitian 的,并且它的所有特征值都是实数且正的。
  • M 是对称的或 Hermitian 的,并且它的所有先导主次要函数都是正的。
  • 存在可逆矩阵 B M=BHBM=B^HB

如果矩阵满足类似的等效条件,其中“正”被“非负”替换,“可逆矩阵”被“矩阵”替换,并且单词“前导”被删除,则该矩阵是半正定矩阵。

正定和正半定实数矩阵是凸优化的基础,因为,给定一个二次可微的多个实数变量的函数,那么如果其Hessian 矩阵(其二阶偏导数矩阵)在点p 处是正定的 , 那么函数在p附近是函数,反之,如果函数在 p 附近是凸函数 p , 那么 Hessian 矩阵在点p处是正半定的.