Using Conjugate Gradient to Solve Classical Matrix Equations

In linear algebra, the conjugate gradient method is an algorithm for numerically approximating the solution of a system of linear equations. This post focuses on the conjugate gradient method and its applications to solving matrix equations. The whole story will cover the following contents:

  • Introduction
  • Preliminaries
    • Vectorization operator
    • Kronecker product
  • Classical matrix equations
    • Lyapunov equation
    • Sylvester equation
    • Generalized Sylvester equation
  • Conjugate gradient (CG) method
    • Algorithm
    • solving linear equation
    • solving matrix equation
    • solving matrix factorization on incomplete data
  • CG’s potential for large-scale and sparse problem
  • Conclusion

Preliminaries

We introduce vectorization operator and Kronecker product, the basic building blocks, for the following content.

Vectorization Operator

Vectorization is a basic operator for converting a matrix (or tensor) into a vector, which follows the rule that the operator stacks the columns of a matrix into a “long” vector. For instance, the vectorization of is

Kronecker Product

Kronecker product is important and widely used in matrix computations, which is named after a famous mathematician Leopold Kronecker (1823–1891). By definition, the Kronecker product between two matrices and is

where the symbol denotes the Kronecker product.

For example, the Kronecker product between two matrices and is

Putting the vectorization operator and the Kronecker product together, there exists an important property for the linear system , which is given by

This property allows one to express a matrix equation like in the form of standard linear equation: .

Classical Matrix Equations

Sylvester Equation

Sylvester equation is a linear matrix equation, and it usually arises in applications in control theory. This kind of matrix equation is named after a famous mathematician James Joseph Sylvester (1814–1897), who contributed the homogeneous version of the equation in 1884.

Given matrices , , and , a Sylvester equation is a matrix equation of the form:

where the problem is seeking the possible solution of .

Since the Sylvester equation indeed a linear equation, it is possible to write the formula in the standard linear equation . This can be done by using both vectorization operator and Kronecker product, which yields

This formula allows one to obtain the closed-form solution, however, it would cost . Can we do better? One famous method with an acceptable complexity is Bartels-Stewart algorithm, which is developed by Bartels and Stewart in 1972.

For instance, given , , and , what is the solution of ?

import numpy as np
from scipy import linalg

A = np.array([[1, 0, 2, 3], [4, 1, 0, 2], [0, 5, 5, 6], [1, 7, 9, 0]])
B = np.array([[0, -1], [1, 0]])
C = np.array([[1, 0], [2, 0], [0, 3], [1, 1]])

X = linalg.solve_sylvester(A, B, C)
print(X)

The output would be

[[ 0.47316381 -0.36642543]
 [-0.40056724  0.35311491]
 [ 0.33053301 -0.11422005]
 [ 0.07739853  0.35600978]]

Generalized Sylvester Equation

Sylvester equation has many variants and special cases. It can be generalized to multiple terms and to have coefficient matrices on both sides of , yielding

Conjugate Gradient (CG) Method

Algorithm

For solving linear equation in which is real, symmetric, and positive-definite matrix. The input vector is the initial value.

  • if is sufficiently small, then return as the result
  • repeat
    • if is sufficiently small, then exit loop
  • end repeat
  • return as the result

Solving Matrix Equation

import numpy as np

def compute_Ax(A, B, X):
    return np.reshape(A.T @ A @ X + A.T @ X @ B + A @ X @ B.T + X @ B @ B.T, -1, order = 'F')

def solve_sylvester(A, B, C):
    dim1 = A.shape[1]
    dim2 = B.shape[0]
    X = np.random.randn(dim1, dim2)
    x = np.reshape(X, -1, order = 'F')
    b = np.reshape(A.T @ C + C @ B.T, -1, order = 'F')
    Ax = compute_Ax(A, B, X)
    r = b - Ax
    p = r.copy()
    rold = np.inner(r, r)
    for it in range(5):
        Ap = compute_Ax(A, B, np.reshape(p, (dim1, dim2), order = 'F'))
        alpha = rold / np.inner(p, Ap)
        x += alpha * p
        r -= alpha * Ap
        rnew = np.inner(r, r)
        if np.sqrt(rnew) < 1e-10:
            break
        p = r + (rnew / rold) * p
        rold = rnew.copy()
    return np.reshape(x, (dim1, dim2), order = 'F')

A = np.array([[1, 0, 2, 3], [4, 1, 0, 2], [0, 5, 5, 6], [1, 7, 9, 0]])
B = np.array([[0, -1], [1, 0]])
C = np.array([[1, 0], [2, 0], [0, 3], [1, 1]])
print(solve_sylvester(A, B, C))

Solving Matrix Factorization on Incomplete Data

Matrix factorization is an important tool for many real-world applications, e.g., recommender systems. The idea behind matrix factorization is to represent partially observed matrix in a lower dimensional latent space. For any partially observed data matrix with the observed index set , one can approximately factorize the data matrix into latent factor matrices and of rank . Element-wise, any th entry in can be reconstructed by the combination of latent factors:

where and are the th and th columns of and , respectively. They are referred to as latent factors.

If denotes the orthogonal projection supported on the observed index set , then the above matrix factorization can be rewritten as follows,

To achieve such approximation, one fundamental problem is that how to determine the latent matrices, so that the partially specified data matrix matches as closely as possible? One can formulate an optimization problem with respect to the factor matrices, and it follows that

Let , then the first-order partial derivative with respect to as

With respect to , we have

Conclusion

In recent years, many machine learning problems arise in large-scale and sparse data are possibly associated with solving linear equations. There remain some open problems for modeling matrix equations with certain structures (e.g., learning matrix factorization from partially observed data) and deriving efficient numerical methods.

References