共轭方向(Conjugate Direction)
设 A A A 为 n n n 阶对称正定矩阵 ,若有一组非令向量组 u ( 1 ) , u ( 2 ) , ⋯ , u ( n ) ∈ R n u^{(1)},u^{(2)},\cdots,u^{(n)} \in \mathbb R^n u ( 1 ) , u ( 2 ) , ⋯ , u ( n ) ∈ R n 满足
( u ( i ) ) T A u ( j ) = 0 ( i ≠ j ; i , j = 1 , 2 , ⋯ , n ) (u^{(i)})^{\text T}Au^{(j)}=0\quad(i\neq j; \quad i,j = 1,2,\cdots,n) ( u ( i ) ) T A u ( j ) = 0 ( i = j ; i , j = 1 , 2 , ⋯ , n )
则称该向量组为 A A A 共轭。若 A = I A=I A = I 则上述条件即为通常的正交条件。因此可以将 A A A 共轭的概念视为正交概念的推广。
若向量组 u ( 1 ) , u ( 2 ) , ⋯ , u ( n ) u^{(1)},u^{(2)},\cdots,u^{(n)} u ( 1 ) , u ( 2 ) , ⋯ , u ( n ) 为 A A A 共轭的非零向量,则可证这一组向量线性独立。
n n n 阶对称正定矩阵 A A A 的特征向量组为 A A A 共轭。
比如
上图就是关于某一对称正定矩阵 A A A 共轭的两个向量 u , v u,v u , v ,记为 ⟨ u , v ⟩ A \langle u,v \rangle_A ⟨ u , v ⟩ A 。可以看出向量 u , v u,v u , v 关于某条对称轴对称,则它们经过正交变换的向量 A u , A v Au,Av A u , A v 也关于这条对称轴对称。这可能就是“共轭”的几何含义吧。
既然 A A A 的某一完备的共轭的向量组 u 1 , u 2 , ⋯ , u n u_{1},u_{2},\cdots,u_{n} u 1 , u 2 , ⋯ , u n 线性无关,那么任一向量 x ∈ R n \mathbb x \in \mathbb R^n x ∈ R n 都可以用这组向量表示,有
x = ∑ i = 1 n α i u ( i ) \mathbb x = \sum_{i=1}^n \alpha_iu^{(i)} x = i = 1 ∑ n α i u ( i )
将上式左右同乘 u k T A u_{k}^TA u k T A ,有
u k T A x = u k T A ∑ i = 1 n α i u i = α k u k T A u k \begin{align}
u_{k}^TA\mathbf x &= u_{k}^TA \sum_{i=1}^n \alpha_iu_{i}\\
&=\alpha_ku_{k}^TA u_{k}
\end{align} u k T A x = u k T A i = 1 ∑ n α i u i = α k u k T A u k
可得
α k = u k T A x u k T A u k \alpha_k = \frac{u_{k}^TA\mathbf x }{u_{k}^TAu_{k}} α k = u k T A u k u k T A x
进而可得 x \mathbb x x 表达式
x = ∑ i = 1 n u i T A x u i T A u i u i \mathbb x = \sum_{i=1}^n \frac{u_{i}^TA\mathbf x}{u_{i}^TA u_{i}}u_i x = i = 1 ∑ n u i T A u i u i T A x u i
如果有方程组 A x = b A\mathbf x = b A x = b ,且 A A A 存在共轭的向量组 u 1 , u 2 , ⋯ , u n u_{1},u_{2},\cdots,u_{n} u 1 , u 2 , ⋯ , u n ,那么对满足方程组的解 x \mathbf x x ,会有
x = ∑ i = 1 n u i T A x u i T A u i u i = ∑ i = 1 n u i T b u i T A u i u i \mathbb x = \sum_{i=1}^n \frac{u_{i}^TA\mathbf x}{u_{i}^TA u_{i}}u_i = \sum_{i=1}^n \frac{u_{i}^Tb}{u_{i}^TA u_{i}}u_i x = i = 1 ∑ n u i T A u i u i T A x u i = i = 1 ∑ n u i T A u i u i T b u i
共轭梯度法(Conjugate Gradient)
共轭梯度算法用于处理无约束的多元二次函数的极小值
f ( x ) = 1 2 x T A x + b T x + c f(x) = \frac 1 2x^\text TAx+b^\text Tx+c f ( x ) = 2 1 x T A x + b T x + c
其中 x , b ∈ R n x,b\in \mathbb R^n x , b ∈ R n ;A为对称正定矩阵,A ∈ R n × n A\in \mathbb R^{n\times n} A ∈ R n × n ;c c c 为常数。
刚在接触这个算法的时候一脸懵逼,哪来的这么好的式子让我求解?哪来的刚好对称正定的方阵?要解释这个问题,以及这个式子代表着什么,还得联系着牛顿法,将两种方法串联起来。
说到牛顿法,这个问题我也想了好久,到底这个关键的二阶泰勒展开 到底是什么意思?到底能近似到什么程度?关于这个问题想要继续讨论,可以花些时间看一下 这个视频 ?,简单来说
f ( x ) ≈ f ( x k ) + f ′ ( x k ) ( x − x k ) + f ′ ′ ( x k ) 2 ( x − x k ) 2 f(x) \approx f(x_k) + f'(x_k)(x-x_k) + \frac {f”(x_{k})}{2}(x-x_k)^2 f ( x ) ≈ f ( x k ) + f ′ ( x k ) ( x − x k ) + 2 f ′′ ( x k ) ( x − x k ) 2
是在 x k x_k x k 点附近关于 f ( x ) f(x) f ( x ) 的一阶以及二阶导数的大小变化进行提取,并且赋给一个二次曲线,这个二次曲线可以在某些区间内完美的模拟原函数 f ( x ) f(x) f ( x ) 的一、二阶导数的变化情况。当然,关于比较复杂的函数,这种低阶展开的对曲线的模拟效果不是很好,只能在展开点的附近才能有较小的误差。而关于牛顿法,因为牛顿法可以完美解决二次函数极小值的求解问题,所以牛顿法就需要用到目标函数 f ( x ) f(x) f ( x ) 的二阶泰勒展开啦。
再来看在 x k x_k x k 点处二阶泰勒展开的矩阵形式
f ( x ) = f ( x k ) + ∇ f ( x k ) T ( x − x k ) + 1 2 ( x − x k ) T H ( x k ) ( x − x k ) f(x) = f(x_k) + \nabla f(x_k)^\text T(x-x_k) + \frac 1 2 (x-x_k)^\text TH(x_{k})(x-x_k) f ( x ) = f ( x k ) + ∇ f ( x k ) T ( x − x k ) + 2 1 ( x − x k ) T H ( x k ) ( x − x k )
是不是很像共轭梯度法要解决的二次函数
f ( x ) = c + b T x + 1 2 x T A x f(x) = c+b^\text Tx+\frac 1 2x^\text TAx f ( x ) = c + b T x + 2 1 x T A x
其实牛顿法的本质就是用二次曲线来拟合目标函数的某一局部,并求出最优点 x ∗ x^* x ∗ 。而共轭梯度法可以被视作基于这一思想的另一种算法。海塞矩阵 H H H 正定可以保证 f ( x ) f(x) f ( x ) 存在极小点,这似乎可以解释为什么共轭梯度法上来就让我们算这么奇怪的式子。(别忘了,任何一个多元函数二次项都可以写成带有对称矩阵 A A A 的 x T A x x^\text TAx x T A x 形式)
已知牛顿法的表达式
x k + 1 = x k − H − 1 ( x k ) ⋅ ∇ f ( x k ) x_{k+1} = x_k -H^{-1}(x_k) \cdot \nabla f(x_k) x k + 1 = x k − H − 1 ( x k ) ⋅ ∇ f ( x k )
都知道这个算法中 − H − 1 ( x k ) ⋅ ∇ f ( x k ) -H^{-1}(x_k) \cdot \nabla f(x_k) − H − 1 ( x k ) ⋅ ∇ f ( x k ) 计算困难,那么令 d = H − 1 ( x k ) ⋅ ∇ f ( x k ) d = H^{-1}(x_k) \cdot \nabla f(x_k) d = H − 1 ( x k ) ⋅ ∇ f ( x k ) ,则有
H ( x k ) ⋅ d = − ∇ f ( x k ) H(x_k)\cdot d = -\nabla f(x_k) H ( x k ) ⋅ d = − ∇ f ( x k )
如果有 H H H 的共轭的向量组 u 1 , u 2 , ⋯ , u n u_{1},u_{2},\cdots,u_{n} u 1 , u 2 , ⋯ , u n ,类比共轭方向解方程组,有
d = ∑ i = 1 n − u i T ∇ f ( x k ) u i T H u i u i d = \sum_{i=1}^n -\frac{u_{i}^T\nabla f(x_k)}{u_{i}^TH u_{i}}u_i d = i = 1 ∑ n − u i T H u i u i T ∇ f ( x k ) u i
别忘了,我们现在求解的是二次函数,使用牛顿法一步就能就解决问题,有
x ∗ = x 0 − H − 1 ∇ f ( x 0 ) = x 0 + d = x 0 + ∑ i = 1 n u i T ∇ f ( x 0 ) u i T H u i u i \begin{align}
x^* &= x_0 – H^{-1} \nabla f(x_0)\\
&= x_0+d\\
&= x_0 + \sum_{i=1}^n \frac{u_{i}^T\nabla f(x_0)}{u_{i}^TH u_{i}}u_i
\end{align} x ∗ = x 0 − H − 1 ∇ f ( x 0 ) = x 0 + d = x 0 + i = 1 ∑ n u i T H u i u i T ∇ f ( x 0 ) u i
将式子展开
x ∗ = x 0 + ∑ i = 1 n u i T ∇ f ( x 0 ) u i T H u i u i x^* = x_0 + \sum_{i=1}^n \frac{u_{i}^T\nabla f(x_0)}{u_{i}^TH u_{i}}u_i x ∗ = x 0 + i = 1 ∑ n u i T H u i u i T ∇ f ( x 0 ) u i
上式也可以可以换一种表达形式:
假设有一组 H H H 的共轭的向量组 u 1 , u 2 , ⋯ , u n u_{1},u_{2},\cdots,u_{n} u 1 , u 2 , ⋯ , u n ,可以写出迭代式
x k = x 0 + β 1 u 1 + β 2 u 2 + ⋯ + β k u k x_k = x_0 + \beta_1u_1 + \beta_2u_2+\dots+\beta_ku_k x k = x 0 + β 1 u 1 + β 2 u 2 + ⋯ + β k u k
那么最优解就可以用 n n n 次迭代之后的迭代式来表示
x ∗ = x 0 + β 1 u 1 + β 2 u 2 + ⋯ + β n u n x^* = x_0 + \beta_1u_1 + \beta_2u_2+\dots+\beta_nu_n x ∗ = x 0 + β 1 u 1 + β 2 u 2 + ⋯ + β n u n
上式如果除去 x 0 x_0 x 0 项明显是一个方程组,那么 β k \beta_k β k 可以用共轭向量法求解出来,有
β k = u k T H ( x ∗ − x 0 ) u k T H u k \beta_k = \frac{u_{k}^TH(x^*-x_0)}{u_{k}^TH u_{k}} β k = u k T H u k u k T H ( x ∗ − x 0 )
令 x ∗ − x 0 = ( x ∗ − x k − 1 ) + ( x k − 1 − x 0 ) x^*-x_0 = (x^*-x_{k-1})+(x_{k-1}-x_0) x ∗ − x 0 = ( x ∗ − x k − 1 ) + ( x k − 1 − x 0 ) ,则
u k T H ( x ∗ − x 0 ) = u k T H ( x ∗ − x k − 1 ) + u k T H ( x k − 1 − x 0 ) = u k T H ( x ∗ − x k − 1 ) + u k T H ( β 1 u 1 + β 2 u 2 + ⋯ + β k − 1 u k − 1 ) = u k T H ( x ∗ − x k − 1 ) \begin{align}
u_{k}^TH(x^*-x_0) &= u_{k}^TH(x^*-x_{k-1})+u_{k}^TH(x_{k-1}-x_0)\\
&= u_{k}^TH(x^*-x_{k-1}) + u_k^TH(\beta_1u_1+\beta_2u_2+\cdots+\beta_{k-1}u_{k-1})\\
&= u_{k}^TH(x^*-x_{k-1})
\end{align} u k T H ( x ∗ − x 0 ) = u k T H ( x ∗ − x k − 1 ) + u k T H ( x k − 1 − x 0 ) = u k T H ( x ∗ − x k − 1 ) + u k T H ( β 1 u 1 + β 2 u 2 + ⋯ + β k − 1 u k − 1 ) = u k T H ( x ∗ − x k − 1 )
根据
f ( x ) = f ( x k ) + ∇ f ( x k ) T ( x − x k ) + 1 2 ( x − x k ) T H ( x − x k ) f(x) = f(x_k) + \nabla f(x_k)^\text T(x-x_k) + \frac 1 2 (x-x_k)^\text TH(x-x_k) f ( x ) = f ( x k ) + ∇ f ( x k ) T ( x − x k ) + 2 1 ( x − x k ) T H ( x − x k )
对其求导,并且将 x ∗ x^* x ∗ 代入式子中
∇ f ( x ∗ ) = ∇ f ( x k ) + H ( x ∗ − x k ) = 0 \nabla f(x^*) = \nabla f(x_k) +H(x^*-x_k)=0 ∇ f ( x ∗ ) = ∇ f ( x k ) + H ( x ∗ − x k ) = 0
将其代入 u k T H ( x ∗ − x 0 ) u_{k}^TH(x^*-x_0) u k T H ( x ∗ − x 0 ) ,得
u k T H ( x ∗ − x 0 ) = u k T H ( x ∗ − x k − 1 ) = − u k T ∇ f ( x k − 1 ) \begin{align}
u_{k}^TH(x^*-x_0) &= u_{k}^TH(x^*-x_{k-1})\\
&= -u_{k}^T\nabla f(x_{k-1})
\end{align} u k T H ( x ∗ − x 0 ) = u k T H ( x ∗ − x k − 1 ) = − u k T ∇ f ( x k − 1 )
也那么
β k = u k T H ( x ∗ − x 0 ) u k T H u k = − u k T ∇ f ( x k − 1 ) u k T H u k \beta_k = \frac{u_{k}^TH(x^*-x_0)}{u_{k}^TH u_{k}} = -\frac{u_{k}^T\nabla f(x_{k-1})}{u_{k}^TH u_{k}} β k = u k T H u k u k T H ( x ∗ − x 0 ) = − u k T H u k u k T ∇ f ( x k − 1 )
怎么回事?由于是同一组基,那么势必 α k = β k \alpha_k = \beta_k α k = β k ,那也就代表着
u k T ∇ f ( x k − 1 ) = u k T ∇ f ( x 0 ) u_{k}^T\nabla f(x_{k-1}) = u_{k}^T\nabla f(x_0) u k T ∇ f ( x k − 1 ) = u k T ∇ f ( x 0 )
我们可以将 ∇ f ( x k ) \nabla f(x_k) ∇ f ( x k ) 拆解
∇ f ( x k ) = − H ( x ∗ − x k ) = − H ( x ∗ − ( x k − 1 + H β k u k ) ) = ∇ f ( x k − 1 ) + H β k u k = ∇ f ( x k − 2 ) + H β k − 1 u k − 1 + H β k u k = … = ∇ f ( x 0 ) + H ∑ i = 1 k β i u i \begin{align}
\nabla f(x_k) &= -H(x^*-x_{k})\\
&= -H(x^*-(x_{k-1}+H\beta_ku_k))\\
&= \nabla f(x_{k-1})+H\beta_ku_k\\
&= \nabla f(x_{k-2})+H\beta_{k-1}u_{k-1}+H\beta_ku_k\\
&= \dots\\
&= \nabla f(x_{0})+H\sum_{i=1}^k\beta_iu_i\\
\end{align} ∇ f ( x k ) = − H ( x ∗ − x k ) = − H ( x ∗ − ( x k − 1 + H β k u k )) = ∇ f ( x k − 1 ) + H β k u k = ∇ f ( x k − 2 ) + H β k − 1 u k − 1 + H β k u k = … = ∇ f ( x 0 ) + H i = 1 ∑ k β i u i
这样我们就可以得出一个递推式,不仅能解决上面的问题,如果我们将 ∇ f ( x k ) \nabla f(x_k) ∇ f ( x k ) 左乘 x k T x_k^\text T x k T ,会有
u k T ∇ f ( x k ) = u k T ∇ f ( x 0 ) + u k T H ∑ i = 1 k β i u i = − u k T H ( x ∗ − x 0 ) + β k u k T H u k = − β k u k T H u k + β k u k T H u k = 0 \begin{align}
u_k^\text T\nabla f(x_k) &= u_k^\text T\nabla f(x_{0})+u_k^\text TH\sum_{i=1}^k\beta_iu_i\\
&= -u_k^\text TH(x^*-x_0) + \beta_ku_k^\text THu_k\\
&= -\beta_ku_k^\text THu_k + \beta_ku_k^\text THu_k\\
&= 0
\end{align} u k T ∇ f ( x k ) = u k T ∇ f ( x 0 ) + u k T H i = 1 ∑ k β i u i = − u k T H ( x ∗ − x 0 ) + β k u k T H u k = − β k u k T H u k + β k u k T H u k = 0
也就是说,点 x k x_k x k 处的梯度与第 k k k 次的搜索方向正交。
如果将 f ( x k ) f(x_k) f ( x k ) 写成 f ( x k − 1 + β k u k ) f(x_{k-1}+\beta_ku_k) f ( x k − 1 + β k u k ) ,那么如果需要求在 u k u_k u k 方向的最优步长,即
min β k f ( x k − 1 + β k u k ) \min_{\beta_k}f(x_{k-1}+\beta_ku_k) β k min f ( x k − 1 + β k u k )
需要对 f ( x k ) f(x_k) f ( x k ) 求偏导
∂ f ( x k − 1 + β k u k ) ∂ β k = u k T ∇ f ( x k ) = 0 \begin{align}
\frac{\partial f(x_{k-1}+\beta_ku_k)}{\partial \beta_k} &= u_k^\text T\nabla f(x_{k})= 0
\end{align} ∂ β k ∂ f ( x k − 1 + β k u k ) = u k T ∇ f ( x k ) = 0
得到了同样的结论。在解释之前,先看下面一幅图
以目标函数 f ( x ) = x 1 2 + x 2 2 f(x) = x_1^2 + x_2^2 f ( x ) = x 1 2 + x 2 2 为例,在沿着 u 1 u_1 u 1 方向搜索时,选择了在这个方向上的最优解,也就是 min β 1 f ( x 0 + β 1 u 1 ) \min_{\beta_1}f(x_0+\beta_1u_1) min β 1 f ( x 0 + β 1 u 1 ) ,这样也就注定点 x 1 x_1 x 1 处的斜率不会再 u 1 u_1 u 1 方向的分量上还有偏移。这与我们刚开始提出的方程组的结果 α \alpha α 不谋而合。
那么另一个问题出现了,上哪里去找一组关于 H H H 的共轭向量呢?在线性代数中有提到过一个向量正交化的过程:Gram-Schmidt 正交化
施密特正交化中的投影算子 是正交化的关键,投影算子 proj u ( v ) \text{proj}_u(v) proj u ( v ) 来表示向量 v v v 在向量 u u u 方向上的分量
proj u ( v ) = ⟨ v , u ⟩ ⟨ u , u ⟩ u \text{proj}_u(v) = \frac{\langle v,u \rangle}{\langle u,u \rangle}u proj u ( v ) = ⟨ u , u ⟩ ⟨ v , u ⟩ u
这样,我们就可以根据一组向量 v 1 , v 2 , ⋯ , v n v_{1},v_{2},\cdots,v_{n} v 1 , v 2 , ⋯ , v n 来得到一组正交基 u 1 , u 2 , ⋯ , u n u_{1},u_{2},\cdots,u_{n} u 1 , u 2 , ⋯ , u n :
u 1 = v 1 u 2 = v 2 − ⟨ v 2 , u 1 ⟩ ⟨ u 1 , u 1 ⟩ u 1 u 3 = v 3 − ⟨ v 3 , u 1 ⟩ ⟨ u 1 , u 1 ⟩ u 1 − ⟨ v 3 , u 2 ⟩ ⟨ u 2 , u 2 ⟩ u 2 ⋯ u k = v k − ∑ i = 1 k − 1 ⟨ v k , u i ⟩ ⟨ u i , u i ⟩ u i \begin{align}
u_1 &= v_1\\
u_2 &= v_2 – \frac{\langle v_2,u_1 \rangle}{\langle u_1,u_1 \rangle}u_1\\
u_3 &= v_3 – \frac{\langle v_3,u_1 \rangle}{\langle u_1,u_1 \rangle}u_1-\frac{\langle v_3,u_2 \rangle}{\langle u_2,u_2 \rangle}u_2\\
\cdots\\
u_k &= v_k – \sum_{i=1}^{k-1}\frac{\langle v_k,u_i \rangle}{\langle u_i,u_i \rangle}u_i
\end{align} u 1 u 2 u 3 ⋯ u k = v 1 = v 2 − ⟨ u 1 , u 1 ⟩ ⟨ v 2 , u 1 ⟩ u 1 = v 3 − ⟨ u 1 , u 1 ⟩ ⟨ v 3 , u 1 ⟩ u 1 − ⟨ u 2 , u 2 ⟩ ⟨ v 3 , u 2 ⟩ u 2 = v k − i = 1 ∑ k − 1 ⟨ u i , u i ⟩ ⟨ v k , u i ⟩ u i
上面这个过程我们应该很熟悉了。同样的,可以根据一组向量 v 1 , v 2 , ⋯ , v n v_{1},v_{2},\cdots,v_{n} v 1 , v 2 , ⋯ , v n 来得到一组关于 A A A 的共轭向量 u 1 , u 2 , ⋯ , u n u_{1},u_{2},\cdots,u_{n} u 1 , u 2 , ⋯ , u n :
u 1 = v 1 u 2 = v 2 − ⟨ v 2 , u 1 ⟩ A ⟨ u 1 , u 1 ⟩ A u 1 u 3 = v 3 − ⟨ v 3 , u 1 ⟩ A ⟨ u 1 , u 1 ⟩ A u 1 − ⟨ v 3 , u 2 ⟩ A ⟨ u 2 , u 2 ⟩ A u 2 ⋯ u k = v k − ∑ i = 1 k − 1 ⟨ v k , u i ⟩ A ⟨ u i , u i ⟩ A u i \begin{align}
u_1 &= v_1\\
u_2 &= v_2 – \frac{\langle v_2,u_1 \rangle_A}{\langle u_1,u_1 \rangle_A}u_1\\
u_3 &= v_3 – \frac{\langle v_3,u_1 \rangle_A}{\langle u_1,u_1 \rangle_A}u_1-\frac{\langle v_3,u_2 \rangle_A}{\langle u_2,u_2 \rangle_A}u_2\\
\cdots\\
u_k &= v_k – \sum_{i=1}^{k-1}\frac{\langle v_k,u_i \rangle_A}{\langle u_i,u_i \rangle_A}u_i
\end{align} u 1 u 2 u 3 ⋯ u k = v 1 = v 2 − ⟨ u 1 , u 1 ⟩ A ⟨ v 2 , u 1 ⟩ A u 1 = v 3 − ⟨ u 1 , u 1 ⟩ A ⟨ v 3 , u 1 ⟩ A u 1 − ⟨ u 2 , u 2 ⟩ A ⟨ v 3 , u 2 ⟩ A u 2 = v k − i = 1 ∑ k − 1 ⟨ u i , u i ⟩ A ⟨ v k , u i ⟩ A u i
我们以 f ( x 0 ) f(x_0) f ( x 0 ) 的负梯度方向作为初始方向,有搜索方向的迭代式
u k = − ∇ f ( x k − 1 ) + ∑ i = 1 k − 1 γ i u i , γ i k = ⟨ ∇ f ( x k − 1 ) , u i ⟩ H ⟨ u i , u i ⟩ H u_k = -\nabla f(x_{k-1}) + \sum_{i=1}^{k-1}\gamma_iu_i,\quad\gamma_i^k =\frac{\langle \nabla f(x_{k-1}),u_i \rangle_H}{\langle u_i,u_i \rangle_H} u k = − ∇ f ( x k − 1 ) + i = 1 ∑ k − 1 γ i u i , γ i k = ⟨ u i , u i ⟩ H ⟨ ∇ f ( x k − 1 ) , u i ⟩ H
这样还是有点太复杂了,随着迭代步数的增多,计算次数也会不断上涨,能不能再简化一下呢?
我们定义残差 e k = x k − x ∗ e_k = x_k-x^* e k = x k − x ∗ ,有
e k = x k − x ∗ = ( x 0 + ∑ i = 1 k β i u i ) − ( x 0 + ∑ i = 1 n β i u i ) = ∑ i = k + 1 n β i u i \begin{align}
e_k &= x_k-x^*\\
&=\left(x_0 +\sum_{i=1}^k\beta_iu_i\right) – \left(x_0 +\sum_{i=1}^n\beta_iu_i\right)\\
&= \sum_{i=k+1}^n\beta_iu_i
\end{align} e k = x k − x ∗ = ( x 0 + i = 1 ∑ k β i u i ) − ( x 0 + i = 1 ∑ n β i u i ) = i = k + 1 ∑ n β i u i
这样我们可以得出一个结论:前 k k k 步的搜索方向均与残差 e k e_k e k 关于 H H H 共轭 ,即
u i T H e k = 0 , i = 1 , 2 , … , k u_i^\text THe_k = 0,\quad i=1,2,\dots,k u i T H e k = 0 , i = 1 , 2 , … , k
又因为 H ( x k − x ∗ ) = ∇ f ( x k ) H(x_k-x^*) = \nabla f(x_k) H ( x k − x ∗ ) = ∇ f ( x k ) ,代入上式
0 = u i T H e k = u i T ∇ f ( x k ) = − ∇ f ( x k ) T ∇ f ( x i ) − ∑ j = 1 i − 1 γ j i ∇ f ( x k ) T u j = − ∇ f ( x k ) T ∇ f ( x i ) , i = 1 , 2 , … , k \begin{align}
0&=u_i^\text THe_k\\
& = u_i^\text T\nabla f(x_k)\\
&= -\nabla f(x_k)^\text T\nabla f(x_i) – \sum_{j=1}^{i-1}\gamma_j^i \nabla f(x_k)^\text Tu_j\\
&= -\nabla f(x_k)^\text T\nabla f(x_i),\quad i=1,2,\dots,k
\end{align} 0 = u i T H e k = u i T ∇ f ( x k ) = − ∇ f ( x k ) T ∇ f ( x i ) − j = 1 ∑ i − 1 γ j i ∇ f ( x k ) T u j = − ∇ f ( x k ) T ∇ f ( x i ) , i = 1 , 2 , … , k
这样我们就得出了第二个结论:梯度方向相互正交
在共轭梯度法迭代过程中有迭代式
x k = x k − 1 + β k u k x_{k} = x_{k-1}+\beta_{k}u_{k} x k = x k − 1 + β k u k
左右同减 x ∗ x^* x ∗ 再左乘 H H H ,有
H ( x k − x ∗ ) = H ( x k − 1 − x ∗ ) + β k H u k ∇ f ( x k ) = ∇ f ( x k − 1 ) + β k H u k \begin{align}
H(x_{k}-x^*) &= H(x_{k-1}-x^*) + \beta_{k}Hu_{k}\\
\nabla f(x_k)&= \nabla f(x_{k-1}) + \beta_{k}Hu_{k}
\end{align} H ( x k − x ∗ ) ∇ f ( x k ) = H ( x k − 1 − x ∗ ) + β k H u k = ∇ f ( x k − 1 ) + β k H u k
那么
H u k = 1 β k ( ∇ f ( x k ) − ∇ f ( x k − 1 ) ) Hu_k = \frac 1 \beta_k(\nabla f(x_k) – \nabla f(x_{k-1})) H u k = β 1 k ( ∇ f ( x k ) − ∇ f ( x k − 1 ))
左乘 ∇ f ( x i ) T \nabla f(x_i)^\text T ∇ f ( x i ) T ,根据梯度方向相互正交,
∇ f ( x i ) T H u k = 1 β k ( ∇ f ( x i ) T ∇ f ( x k ) − ∇ f ( x i ) T ∇ f ( x k − 1 ) ) = { − 1 β k ∇ f ( x k − 1 ) T ∇ f ( x k − 1 ) , i = k − 1 1 β k ∇ f ( x k ) T ∇ f ( x k ) , i = k 0 , otherwise \begin{align}
\nabla f(x_i)^\text THu_k &= \frac 1 \beta_k(\nabla f(x_i)^\text T\nabla f(x_k) – \nabla f(x_i)^\text T\nabla f(x_{k-1}))\\
&=
\begin{cases}
-\frac 1 \beta_k\nabla f(x_{k-1})^\text T\nabla f(x_{k-1}) ,&i=k-1\\
\frac 1 \beta_k\nabla f(x_k)^\text T\nabla f(x_k) ,&i=k\\
0,&\text {otherwise}
\end{cases}
\end{align} ∇ f ( x i ) T H u k = β 1 k ( ∇ f ( x i ) T ∇ f ( x k ) − ∇ f ( x i ) T ∇ f ( x k − 1 )) = ⎩ ⎨ ⎧ − β 1 k ∇ f ( x k − 1 ) T ∇ f ( x k − 1 ) , β 1 k ∇ f ( x k ) T ∇ f ( x k ) , 0 , i = k − 1 i = k otherwise
我们再来看搜索方向的迭代式系数
γ i k = ⟨ ∇ f ( x k − 1 ) , u i ⟩ H ⟨ u i , u i ⟩ H = ∇ f ( x k − 1 ) T H u i u i T H u i \gamma_i^k =\frac{\langle \nabla f(x_{k-1}),u_i \rangle_H}{\langle u_i,u_i \rangle_H} =
\frac{\nabla f(x_{k-1})^\text THu_i}{u_i^\text THu_i} γ i k = ⟨ u i , u i ⟩ H ⟨ ∇ f ( x k − 1 ) , u i ⟩ H = u i T H u i ∇ f ( x k − 1 ) T H u i
当 k < i − 1 k < i-1 k < i − 1 时,γ i k = 0 \gamma_i^k = 0 γ i k = 0 ,那么
u k = − ∇ f ( x k − 1 ) + ∑ i = 1 k − 1 γ i k u i = − ∇ f ( x k − 1 ) + γ k − 1 k u k − 1 = − ∇ f ( x k − 1 ) + ∇ f ( x k − 1 ) T H u k − 1 u k − 1 T H u k − 1 u k − 1 = − ∇ f ( x k − 1 ) + ∇ f ( x k − 1 ) T ∇ f ( x k − 1 ) β k u k − 1 T H u k − 1 u k − 1 \begin{align}
u_k &= -\nabla f(x_{k-1}) + \sum_{i=1}^{k-1}\gamma_i^{k}u_i\\
&= -\nabla f(x_{k-1})+\gamma_{k-1}^{k}u_{k-1}\\
&= -\nabla f(x_{k-1})+\frac{\nabla f(x_{k-1})^\text THu_{k-1}}{u_{k-1}^\text THu_{k-1}}u_{k-1}\\
&=-\nabla f(x_{k-1})+\frac{\nabla f(x_{k-1})^\text T\nabla f(x_{k-1})}{\beta_ku_{k-1}^\text THu_{k-1}}u_{k-1}
\end{align} u k = − ∇ f ( x k − 1 ) + i = 1 ∑ k − 1 γ i k u i = − ∇ f ( x k − 1 ) + γ k − 1 k u k − 1 = − ∇ f ( x k − 1 ) + u k − 1 T H u k − 1 ∇ f ( x k − 1 ) T H u k − 1 u k − 1 = − ∇ f ( x k − 1 ) + β k u k − 1 T H u k − 1 ∇ f ( x k − 1 ) T ∇ f ( x k − 1 ) u k − 1
由此可以看出,根据 Gram-Schmidt 方法,第 k k k 个共轭向量只与第 k − 1 k-1 k − 1 个共轭向量有关系,关于共轭向量的计算被大大的简化了。