Using Conjugate Gradient to Solve Classical Matrix Equations
In linear algebra, the conjugate gradient method is an algorithm for numerically approximating the solution of a system of linear equations. This post focuses on the conjugate gradient method and its applications to solving matrix equations. The whole story will cover the following contents:
- Introduction
- Preliminaries
- Vectorization operator
- Kronecker product
- Classical matrix equations
- Lyapunov equation
- Sylvester equation
- Generalized Sylvester equation
- Conjugate gradient (CG) method
- Algorithm
- solving linear equation
- solving matrix equation
- solving matrix factorization on incomplete data
- CG’s potential for large-scale and sparse problem
- Conclusion
Preliminaries
We introduce vectorization operator and Kronecker product, the basic building blocks, for the following content.
Vectorization Operator
Vectorization is a basic operator for converting a matrix (or tensor) into a vector, which follows the rule that the operator stacks the columns of a matrix into a “long” vector. For instance, the vectorization of is
Kronecker Product
Kronecker product is important and widely used in matrix computations, which is named after a famous mathematician Leopold Kronecker (1823–1891). By definition, the Kronecker product between two matrices and
is
where the symbol denotes the Kronecker product.
For example, the Kronecker product between two matrices and
is
Putting the vectorization operator and the Kronecker product together, there exists an important property for the linear system , which is given by
This property allows one to express a matrix equation like in the form of standard linear equation:
.
Classical Matrix Equations
Sylvester Equation
Sylvester equation is a linear matrix equation, and it usually arises in applications in control theory. This kind of matrix equation is named after a famous mathematician James Joseph Sylvester (1814–1897), who contributed the homogeneous version of the equation in 1884.
Given matrices ,
, and
, a Sylvester equation is a matrix equation of the form:
where the problem is seeking the possible solution of .
Since the Sylvester equation indeed a linear equation, it is possible to write the formula in the standard linear equation . This can be done by using both vectorization operator and Kronecker product, which yields
This formula allows one to obtain the closed-form solution, however, it would cost . Can we do better? One famous method with an acceptable complexity is Bartels-Stewart algorithm, which is developed by Bartels and Stewart in 1972.
For instance, given ,
, and
, what is the solution of
?
import numpy as np
from scipy import linalg
A = np.array([[1, 0, 2, 3], [4, 1, 0, 2], [0, 5, 5, 6], [1, 7, 9, 0]])
B = np.array([[0, -1], [1, 0]])
C = np.array([[1, 0], [2, 0], [0, 3], [1, 1]])
X = linalg.solve_sylvester(A, B, C)
print(X)
The output would be
[[ 0.47316381 -0.36642543]
[-0.40056724 0.35311491]
[ 0.33053301 -0.11422005]
[ 0.07739853 0.35600978]]
Generalized Sylvester Equation
Sylvester equation has many variants and special cases. It can be generalized to multiple terms and to have coefficient matrices on both sides of , yielding
Conjugate Gradient (CG) Method
Algorithm
For solving linear equation in which
is real, symmetric, and positive-definite matrix. The input vector
is the initial value.
- if
is sufficiently small, then return
as the result
- repeat
- if
is sufficiently small, then exit loop
- end repeat
- return
as the result
Solving Matrix Equation
import numpy as np
def compute_Ax(A, B, X):
return np.reshape(A.T @ A @ X + A.T @ X @ B + A @ X @ B.T + X @ B @ B.T, -1, order = 'F')
def solve_sylvester(A, B, C):
dim1 = A.shape[1]
dim2 = B.shape[0]
X = np.random.randn(dim1, dim2)
x = np.reshape(X, -1, order = 'F')
b = np.reshape(A.T @ C + C @ B.T, -1, order = 'F')
Ax = compute_Ax(A, B, X)
r = b - Ax
p = r.copy()
rold = np.inner(r, r)
for it in range(5):
Ap = compute_Ax(A, B, np.reshape(p, (dim1, dim2), order = 'F'))
alpha = rold / np.inner(p, Ap)
x += alpha * p
r -= alpha * Ap
rnew = np.inner(r, r)
if np.sqrt(rnew) < 1e-10:
break
p = r + (rnew / rold) * p
rold = rnew.copy()
return np.reshape(x, (dim1, dim2), order = 'F')
A = np.array([[1, 0, 2, 3], [4, 1, 0, 2], [0, 5, 5, 6], [1, 7, 9, 0]])
B = np.array([[0, -1], [1, 0]])
C = np.array([[1, 0], [2, 0], [0, 3], [1, 1]])
print(solve_sylvester(A, B, C))
Solving Matrix Factorization on Incomplete Data
Matrix factorization is an important tool for many real-world applications, e.g., recommender systems. The idea behind matrix factorization is to represent partially observed matrix in a lower dimensional latent space. For any partially observed data matrix with the observed index set
, one can approximately factorize the data matrix into latent factor matrices
and
of rank
. Element-wise, any
th entry
in
can be reconstructed by the combination of latent factors:
where and
are the
th and
th columns of
and
, respectively. They are referred to as latent factors.
If denotes the orthogonal projection supported on the observed index set
, then the above matrix factorization can be rewritten as follows,
To achieve such approximation, one fundamental problem is that how to determine the latent matrices, so that the partially specified data matrix matches
as closely as possible? One can formulate an optimization problem with respect to the factor matrices, and it follows that
Let , then the first-order partial derivative with respect to
as
With respect to , we have
Conclusion
In recent years, many machine learning problems arise in large-scale and sparse data are possibly associated with solving linear equations. There remain some open problems for modeling matrix equations with certain structures (e.g., learning matrix factorization from partially observed data) and deriving efficient numerical methods.