Francesco Tudisco

Associate Professor (Reader) in Machine Learning

School of Mathematics, The University of Edinburgh
The Maxwell Institute for Mathematical Sciences
School of Mathematics, Gran Sasso Science Institute JCMB, King’s Buildings, Edinburgh EH93FD UK
email: f dot tudisco at ed.ac.uk

Preconditioned CG


The conjugate gradient method is one of the most powerful methods for the solutions of linear systems $Ax_\star=b$, where $A$ is a symmetric and positive definite matrix. It is particularly convenient for large and sparse matrices as, at each step, the only operation involving $A$ we are required to do is the product of $A$ times a vector. Recall that the convergence rate of the method can be estimated by looking at the energy norm of the error $E_{k} = x_k -x_\star$. The following upper bound holds \begin{equation}\label{eq:CG-conv-rate} \frac{\|E_{k}\|_A}{\|E_0\|_A}\leq 2\left(\frac{\sqrt{\kappa(A)}-1}{\sqrt{\kappa(A)}+1}\right)^k, , \end{equation} being $\kappa(A)=\lambda_{\max}(A)/\lambda_{\min}(A)$ the $\ell^2$ condition number of $A$.

Given a matrix $A$, in this lab you are asked to:


The matrix $A$ you should consider to work with are:

1. Diagonal $100\times 100$ matrix with eigenvalues $\lambda_1=5$, $\lambda_2=4$, $\lambda_3 = 1.5$, $\lambda_4=1.4$, $\lambda_5=1.3$ and $\lambda_6=\cdots=\lambda_{100}=1$.

2. $$ \begin{cases} A_{ii} = \sqrt{i} & \newline A_{ij} = 1, & \text{if } |i-j| = k\newline A_{ij} = 0 & \text{otherwise} \end{cases} $$

3.


Pseudocode of the PCG algorithm

function x = my_pcg(A,P,b,x0,maxiter,tol)
%  Input: A, SPD matrix
%         P, preconditioner
%         b, rhs vector
%         x0, starting vector
%         maxiter, max number of iterations allowed
%         tol, convergence tolerance
% Output: x_approx, approximation of the solution to Ax=b

M = P^{-1};
r0 = b - A*x0;
p1 = M*r0;
for k = 1 : maxiter
    a_k = (r_{k-1}, M*r_{k-1}) /  (p_k,A*p_k);
    x_k = x_{k-1} + a_k * p_k;
    r_k = r_{k-1} - a_k * A * p_k;
    b_{k+1} = (r_k, M*r_k) / (r_{k-1}, M*r_{k-1});
    p_{k+1} = P^{-1}*r_k + b_{k+1}*p_k;

    if tol has been reached
        x_approx = x_k;
        break;
    end
end

end

Implementation notes

Note that the method can (has to) be implemented so that only four vectors are allocated in the memory, overall. Moreover, note that the most costly operation per step is $A\times vector$, which requires $O(nnz)$ flops, being $nnz$ the number of non-zero entries of $A$.

You can plot the norm of the residual $\|r_m\|$ and the norm of the true residual $\|Ax_m-b\|$ and compare them. You should observe that their behavior is different due to machine precision. Also, you should observe that the upper-bound \eqref{eq:CG-conv-rate} is quite loose, but still, when $\kappa(A)\gg 1$ the method can be very slow.