You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1101 lines
43 KiB
1101 lines
43 KiB
// David Eberly, Geometric Tools, Redmond WA 98052
|
|
// Copyright (c) 1998-2021
|
|
// Distributed under the Boost Software License, Version 1.0.
|
|
// https://www.boost.org/LICENSE_1_0.txt
|
|
// https://www.geometrictools.com/License/Boost/LICENSE_1_0.txt
|
|
// Version: 4.0.2019.08.13
|
|
|
|
#pragma once
|
|
|
|
#include <Mathematics/Math.h>
|
|
#include <Mathematics/RangeIteration.h>
|
|
#include <algorithm>
|
|
#include <cstring>
|
|
#include <vector>
|
|
|
|
// The SingularValueDecomposition class is an implementation of Algorithm
|
|
// 8.3.2 (The SVD Algorithm) described in "Matrix Computations, 2nd
|
|
// edition" by G. H. Golub and Charles F. Van Loan, The Johns Hopkins
|
|
// Press, Baltimore MD, Fourth Printing 1993. Algorithm 5.4.2 (Householder
|
|
// Bidiagonalization) is used to reduce A to bidiagonal B. Algorithm 8.3.1
|
|
// (Golub-Kahan SVD Step) is used for the iterative reduction from bidiagonal
|
|
// to diagonal. If A is the original matrix, S is the matrix whose diagonal
|
|
// entries are the singular values, and U and V are corresponding matrices,
|
|
// then theoretically U^T*A*V = S. Numerically, we have errors
|
|
// E = U^T*A*V - S. Algorithm 8.3.2 mentions that one expects |E| is
|
|
// approximately u*|A|, where |M| denotes the Frobenius norm of M and where
|
|
// u is the unit roundoff for the floating-point arithmetic: 2^{-23} for
|
|
// 'float', which is FLT_EPSILON = 1.192092896e-7f, and 2^{-52} for'double',
|
|
// which is DBL_EPSILON = 2.2204460492503131e-16.
|
|
//
|
|
// The condition |a(i,i+1)| <= epsilon*(|a(i,i) + a(i+1,i+1)|) used to
|
|
// determine when the reduction decouples to smaller problems is implemented
|
|
// as: sum = |a(i,i)| + |a(i+1,i+1)|; sum + |a(i,i+1)| == sum. The idea is
|
|
// that the superdiagonal term is small relative to its diagonal neighbors,
|
|
// and so it is effectively zero. The unit tests have shown that this
|
|
// interpretation of decoupling is effective.
|
|
//
|
|
// The condition |a(i,i)| <= epsilon*|B| used to determine when the
|
|
// reduction decouples (with a zero singular value) is implemented using
|
|
// the Frobenius norm of B and epsilon = multiplier*u, where for now the
|
|
// multiplier is hard-coded in Solve(...) as 8.
|
|
//
|
|
// The authors suggest that once you have the bidiagonal matrix, a practical
|
|
// implementation will store the diagonal and superdiagonal entries in linear
|
|
// arrays, ignoring the theoretically zero values not in the 2-band. This is
|
|
// good for cache coherence, and we have used the suggestion. The essential
|
|
// parts of the Householder u-vectors are stored in the lower-triangular
|
|
// portion of the matrix and the essential parts of the Householder v-vectors
|
|
// are stored in the upper-triangular portion of the matrix. To avoid having
|
|
// to recompute 2/Dot(u,u) and 2/Dot(v,v) when constructing orthogonal U and
|
|
// V, we store these quantities in additional memory during bidiagonalization.
|
|
//
|
|
// For matrices with randomly generated values in [0,1], the unit tests
|
|
// produce the following information for N-by-N matrices.
|
|
//
|
|
// N |A| |E| |E|/|A| iterations
|
|
// -------------------------------------------
|
|
// 2 1.4831 4.1540e-16 2.8007e-16 1
|
|
// 3 2.1065 3.5024e-16 1.6626e-16 4
|
|
// 4 2.4979 7.4605e-16 2.9867e-16 6
|
|
// 5 3.6591 1.8305e-15 5.0025e-16 9
|
|
// 6 4.0572 2.0571e-15 5.0702e-16 10
|
|
// 7 4.7745 2.9057e-15 6.0859e-16 12
|
|
// 8 5.1964 2.7958e-15 5.3803e-16 13
|
|
// 9 5.7599 3.3128e-15 5.7514e-16 16
|
|
// 10 6.2700 3.7209e-15 5.9344e-16 16
|
|
// 11 6.8220 5.0580e-15 7.4142e-16 18
|
|
// 12 7.4540 5.2493e-15 7.0422e-16 21
|
|
// 13 8.1225 5.6043e-15 6.8997e-16 24
|
|
// 14 8.5883 5.8553e-15 6.8177e-16 26
|
|
// 15 9.1337 6.9663e-15 7.6270e-16 27
|
|
// 16 9.7884 9.1358e-15 9.3333e-16 29
|
|
// 17 10.2407 8.2715e-15 8.0771e-16 34
|
|
// 18 10.7147 8.9748e-15 8.3761e-16 33
|
|
// 19 11.1887 1.0094e-14 9.0220e-16 32
|
|
// 20 11.7739 9.7000e-15 8.2386e-16 35
|
|
// 21 12.2822 1.1217e-14 9.1329e-16 36
|
|
// 22 12.7649 1.1071e-14 8.6732e-16 37
|
|
// 23 13.3366 1.1271e-14 8.4513e-16 41
|
|
// 24 13.8505 1.2806e-14 9.2460e-16 43
|
|
// 25 14.4332 1.3081e-14 9.0637e-16 43
|
|
// 26 14.9301 1.4882e-14 9.9680e-16 46
|
|
// 27 15.5214 1.5571e-14 1.0032e-15 48
|
|
// 28 16.1029 1.7553e-14 1.0900e-15 49
|
|
// 29 16.6407 1.6219e-14 9.7467e-16 53
|
|
// 30 17.1891 1.8560e-14 1.0797e-15 55
|
|
// 31 17.7773 1.8522e-14 1.0419e-15 56
|
|
//
|
|
// The singularvalues and |E|/|A| values were compared to those generated by
|
|
// Mathematica Version 9.0; Wolfram Research, Inc., Champaign IL, 2012.
|
|
// The results were all comparable with singular values agreeing to a large
|
|
// number of decimal places.
|
|
//
|
|
// Timing on an Intel (R) Core (TM) i7-3930K CPU @ 3.20 GHZ (in seconds)
|
|
// for NxN matrices:
|
|
//
|
|
// N |E|/|A| iters bidiag QR U-and-V comperr
|
|
// -------------------------------------------------------
|
|
// 512 3.8632e-15 848 0.341 0.016 1.844 2.203
|
|
// 1024 5.6456e-15 1654 4.279 0.032 18.765 20.844
|
|
// 2048 7.5499e-15 3250 40.421 0.141 186.607 213.216
|
|
//
|
|
// where iters is the number of QR steps taken, bidiag is the computation
|
|
// of the Householder reflection vectors, U-and-V is the composition of
|
|
// Householder reflections and Givens rotations to obtain the orthogonal
|
|
// matrices of the decomposigion, and comperr is the computation E =
|
|
// U^T*A*V - S.
|
|
|
|
namespace gte
|
|
{
|
|
template <typename Real>
|
|
class SingularValueDecomposition
|
|
{
|
|
public:
|
|
// The solver processes MxN symmetric matrices, where M >= N > 1
|
|
// ('numRows' is M and 'numCols' is N) and the matrix is stored in
|
|
// row-major order. The maximum number of iterations
|
|
// ('maxIterations') must be specified for the reduction of a
|
|
// bidiagonal matrix to a diagonal matrix. The goal is to compute
|
|
// MxM orthogonal U, NxN orthogonal V and MxN matrix S for which
|
|
// U^T*A*V = S. The only nonzero entries of S are on the diagonal;
|
|
// the diagonal entries are the singular values of the original
|
|
// matrix.
|
|
SingularValueDecomposition(int numRows, int numCols, unsigned int maxIterations)
|
|
:
|
|
mNumRows(0),
|
|
mNumCols(0),
|
|
mMaxIterations(0)
|
|
{
|
|
if (numCols > 1 && numRows >= numCols && maxIterations > 0)
|
|
{
|
|
mNumRows = numRows;
|
|
mNumCols = numCols;
|
|
mMaxIterations = maxIterations;
|
|
mMatrix.resize(numRows * numCols);
|
|
mDiagonal.resize(numCols);
|
|
mSuperdiagonal.resize(numCols - 1);
|
|
mRGivens.reserve(maxIterations * (numCols - 1));
|
|
mLGivens.reserve(maxIterations * (numCols - 1));
|
|
mFixupDiagonal.resize(numCols);
|
|
mPermutation.resize(numCols);
|
|
mVisited.resize(numCols);
|
|
mTwoInvUTU.resize(numCols);
|
|
mTwoInvVTV.resize(numCols - 2);
|
|
mUVector.resize(numRows);
|
|
mVVector.resize(numCols);
|
|
mWVector.resize(numRows);
|
|
}
|
|
}
|
|
|
|
// A copy of the MxN input is made internally. The order of the
|
|
// singular values is specified by sortType: -1 (decreasing),
|
|
// 0 (no sorting), or +1 (increasing). When sorted, the columns of
|
|
// the orthogonal matrices are ordered accordingly. The return value
|
|
// is the number of iterations consumed when convergence occurred,
|
|
// 0xFFFFFFFF when convergence did not occur or 0 when N <= 1 or
|
|
// M < N was passed to the constructor.
|
|
unsigned int Solve(Real const* input, int sortType)
|
|
{
|
|
if (mNumRows > 0)
|
|
{
|
|
int numElements = mNumRows * mNumCols;
|
|
std::copy(input, input + numElements, mMatrix.begin());
|
|
Bidiagonalize();
|
|
|
|
// Compute 'threshold = multiplier*epsilon*|B|' as the
|
|
// threshold for diagonal entries effectively zero; that is,
|
|
// |d| <= |threshold| implies that d is (effectively) zero.
|
|
// TODO: Allow the caller to pass 'multiplier' to the
|
|
// constructor.
|
|
//
|
|
// We will use the L2-norm |B|, which is the length of the
|
|
// elements of B treated as an NM-tuple. The following code
|
|
// avoids overflow when accumulating the squares of the
|
|
// elements when those elements are large.
|
|
Real maxAbsComp = std::fabs(input[0]);
|
|
for (int i = 1; i < numElements; ++i)
|
|
{
|
|
Real absComp = std::fabs(input[i]);
|
|
if (absComp > maxAbsComp)
|
|
{
|
|
maxAbsComp = absComp;
|
|
}
|
|
}
|
|
|
|
Real norm = (Real)0;
|
|
if (maxAbsComp > (Real)0)
|
|
{
|
|
Real invMaxAbsComp = ((Real)1) / maxAbsComp;
|
|
for (int i = 0; i < numElements; ++i)
|
|
{
|
|
Real ratio = input[i] * invMaxAbsComp;
|
|
norm += ratio * ratio;
|
|
}
|
|
norm = maxAbsComp * std::sqrt(norm);
|
|
}
|
|
|
|
Real const multiplier = (Real)8; // TODO: Expose to caller.
|
|
Real const epsilon = std::numeric_limits<Real>::epsilon();
|
|
Real const threshold = multiplier * epsilon * norm;
|
|
|
|
mRGivens.clear();
|
|
mLGivens.clear();
|
|
for (unsigned int j = 0; j < mMaxIterations; ++j)
|
|
{
|
|
int imin = -1, imax = -1;
|
|
for (int i = mNumCols - 2; i >= 0; --i)
|
|
{
|
|
// When a01 is much smaller than its diagonal
|
|
// neighbors, it is effectively zero.
|
|
Real a00 = mDiagonal[i];
|
|
Real a01 = mSuperdiagonal[i];
|
|
Real a11 = mDiagonal[i + 1];
|
|
Real sum = std::fabs(a00) + std::fabs(a11);
|
|
if (sum + std::fabs(a01) != sum)
|
|
{
|
|
if (imax == -1)
|
|
{
|
|
imax = i;
|
|
}
|
|
imin = i;
|
|
}
|
|
else
|
|
{
|
|
// The superdiagonal term is effectively zero
|
|
// compared to the neighboring diagonal terms.
|
|
if (imin >= 0)
|
|
{
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
|
|
if (imax == -1)
|
|
{
|
|
// The algorithm has converged.
|
|
EnsureNonnegativeDiagonal();
|
|
ComputePermutation(sortType);
|
|
return j;
|
|
}
|
|
|
|
// We need to test diagonal entries of B for zero. For
|
|
// each zero diagonal entry, zero the superdiagonal.
|
|
if (DiagonalEntriesNonzero(imin, imax, threshold))
|
|
{
|
|
// Process the lower-right-most unreduced bidiagonal
|
|
// block.
|
|
DoGolubKahanStep(imin, imax);
|
|
}
|
|
}
|
|
return 0xFFFFFFFF;
|
|
}
|
|
else
|
|
{
|
|
return 0;
|
|
}
|
|
}
|
|
|
|
// Get the singular values of the matrix passed to Solve(...). The
|
|
// input 'singularValues' must have N elements.
|
|
void GetSingularValues(Real* singularValues) const
|
|
{
|
|
if (singularValues && mNumCols > 0)
|
|
{
|
|
if (mPermutation[0] >= 0)
|
|
{
|
|
// Sorting was requested.
|
|
for (int i = 0; i < mNumCols; ++i)
|
|
{
|
|
int p = mPermutation[i];
|
|
singularValues[i] = mDiagonal[p];
|
|
}
|
|
}
|
|
else
|
|
{
|
|
// Sorting was not requested.
|
|
size_t numBytes = mNumCols * sizeof(Real);
|
|
std::memcpy(singularValues, &mDiagonal[0], numBytes);
|
|
}
|
|
}
|
|
}
|
|
|
|
// Accumulate the Householder reflections, the Givens rotations, and
|
|
// the diagonal fix-up matrix to compute the orthogonal matrices U and
|
|
// V for which U^T*A*V = S. The input uMatrix must be MxM and the
|
|
// input vMatrix must be NxN, both stored in row-major order.
|
|
void GetU(Real* uMatrix) const
|
|
{
|
|
if (!uMatrix || mNumCols == 0)
|
|
{
|
|
// Invalid input or the constructor failed.
|
|
return;
|
|
}
|
|
|
|
// Start with the identity matrix.
|
|
std::fill(uMatrix, uMatrix + mNumRows * mNumRows, (Real)0);
|
|
for (int d = 0; d < mNumRows; ++d)
|
|
{
|
|
uMatrix[d + mNumRows * d] = (Real)1;
|
|
}
|
|
|
|
// Multiply the Householder reflections using backward
|
|
// accumulation.
|
|
int r, c;
|
|
for (int i0 = mNumCols - 1, i1 = i0 + 1; i0 >= 0; --i0, --i1)
|
|
{
|
|
// Copy the u vector and 2/Dot(u,u) from the matrix.
|
|
Real twoinvudu = mTwoInvUTU[i0];
|
|
Real const* column = &mMatrix[i0];
|
|
mUVector[i0] = (Real)1;
|
|
for (r = i1; r < mNumRows; ++r)
|
|
{
|
|
mUVector[r] = column[mNumCols * r];
|
|
}
|
|
|
|
// Compute the w vector.
|
|
mWVector[i0] = twoinvudu;
|
|
for (r = i1; r < mNumRows; ++r)
|
|
{
|
|
mWVector[r] = (Real)0;
|
|
for (c = i1; c < mNumRows; ++c)
|
|
{
|
|
mWVector[r] += mUVector[c] * uMatrix[r + mNumRows * c];
|
|
}
|
|
mWVector[r] *= twoinvudu;
|
|
}
|
|
|
|
// Update the matrix, U <- U - u*w^T.
|
|
for (r = i0; r < mNumRows; ++r)
|
|
{
|
|
for (c = i0; c < mNumRows; ++c)
|
|
{
|
|
uMatrix[c + mNumRows * r] -= mUVector[r] * mWVector[c];
|
|
}
|
|
}
|
|
}
|
|
|
|
// Multiply the Givens rotations.
|
|
for (auto const& givens : mLGivens)
|
|
{
|
|
int j0 = givens.index0;
|
|
int j1 = givens.index1;
|
|
for (r = 0; r < mNumRows; ++r, j0 += mNumRows, j1 += mNumRows)
|
|
{
|
|
Real& q0 = uMatrix[j0];
|
|
Real& q1 = uMatrix[j1];
|
|
Real prd0 = givens.cs * q0 - givens.sn * q1;
|
|
Real prd1 = givens.sn * q0 + givens.cs * q1;
|
|
q0 = prd0;
|
|
q1 = prd1;
|
|
}
|
|
}
|
|
|
|
if (mPermutation[0] >= 0)
|
|
{
|
|
// Sorting was requested.
|
|
std::fill(mVisited.begin(), mVisited.end(), 0);
|
|
for (c = 0; c < mNumCols; ++c)
|
|
{
|
|
if (mVisited[c] == 0 && mPermutation[c] != c)
|
|
{
|
|
// The item starts a cycle with 2 or more elements.
|
|
int start = c, current = c, next;
|
|
for (r = 0; r < mNumRows; ++r)
|
|
{
|
|
mWVector[r] = uMatrix[c + mNumRows * r];
|
|
}
|
|
while ((next = mPermutation[current]) != start)
|
|
{
|
|
mVisited[current] = 1;
|
|
for (r = 0; r < mNumRows; ++r)
|
|
{
|
|
uMatrix[current + mNumRows * r] =
|
|
uMatrix[next + mNumRows * r];
|
|
}
|
|
current = next;
|
|
}
|
|
mVisited[current] = 1;
|
|
for (r = 0; r < mNumRows; ++r)
|
|
{
|
|
uMatrix[current + mNumRows * r] = mWVector[r];
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
void GetV(Real* vMatrix) const
|
|
{
|
|
if (!vMatrix || mNumCols == 0)
|
|
{
|
|
// Invalid input or the constructor failed.
|
|
return;
|
|
}
|
|
|
|
// Start with the identity matrix.
|
|
std::fill(vMatrix, vMatrix + mNumCols * mNumCols, (Real)0);
|
|
for (int d = 0; d < mNumCols; ++d)
|
|
{
|
|
vMatrix[d + mNumCols * d] = (Real)1;
|
|
}
|
|
|
|
// Multiply the Householder reflections using backward accumulation.
|
|
int i0 = mNumCols - 3;
|
|
int i1 = i0 + 1;
|
|
int i2 = i0 + 2;
|
|
int r, c;
|
|
for (/**/; i0 >= 0; --i0, --i1, --i2)
|
|
{
|
|
// Copy the v vector and 2/Dot(v,v) from the matrix.
|
|
Real twoinvvdv = mTwoInvVTV[i0];
|
|
Real const* row = &mMatrix[mNumCols * i0];
|
|
mVVector[i1] = (Real)1;
|
|
for (r = i2; r < mNumCols; ++r)
|
|
{
|
|
mVVector[r] = row[r];
|
|
}
|
|
|
|
// Compute the w vector.
|
|
mWVector[i1] = twoinvvdv;
|
|
for (r = i2; r < mNumCols; ++r)
|
|
{
|
|
mWVector[r] = (Real)0;
|
|
for (c = i2; c < mNumCols; ++c)
|
|
{
|
|
mWVector[r] += mVVector[c] * vMatrix[r + mNumCols * c];
|
|
}
|
|
mWVector[r] *= twoinvvdv;
|
|
}
|
|
|
|
// Update the matrix, V <- V - v*w^T.
|
|
for (r = i1; r < mNumCols; ++r)
|
|
{
|
|
for (c = i1; c < mNumCols; ++c)
|
|
{
|
|
vMatrix[c + mNumCols * r] -= mVVector[r] * mWVector[c];
|
|
}
|
|
}
|
|
}
|
|
|
|
// Multiply the Givens rotations.
|
|
for (auto const& givens : mRGivens)
|
|
{
|
|
int j0 = givens.index0;
|
|
int j1 = givens.index1;
|
|
for (c = 0; c < mNumCols; ++c, j0 += mNumCols, j1 += mNumCols)
|
|
{
|
|
Real& q0 = vMatrix[j0];
|
|
Real& q1 = vMatrix[j1];
|
|
Real prd0 = givens.cs * q0 - givens.sn * q1;
|
|
Real prd1 = givens.sn * q0 + givens.cs * q1;
|
|
q0 = prd0;
|
|
q1 = prd1;
|
|
}
|
|
}
|
|
|
|
// Fix-up the diagonal.
|
|
for (r = 0; r < mNumCols; ++r)
|
|
{
|
|
for (c = 0; c < mNumCols; ++c)
|
|
{
|
|
vMatrix[c + mNumCols * r] *= mFixupDiagonal[c];
|
|
}
|
|
}
|
|
|
|
if (mPermutation[0] >= 0)
|
|
{
|
|
// Sorting was requested.
|
|
std::fill(mVisited.begin(), mVisited.end(), 0);
|
|
for (c = 0; c < mNumCols; ++c)
|
|
{
|
|
if (mVisited[c] == 0 && mPermutation[c] != c)
|
|
{
|
|
// The item starts a cycle with 2 or more elements.
|
|
int start = c, current = c, next;
|
|
for (r = 0; r < mNumCols; ++r)
|
|
{
|
|
mWVector[r] = vMatrix[c + mNumCols * r];
|
|
}
|
|
while ((next = mPermutation[current]) != start)
|
|
{
|
|
mVisited[current] = 1;
|
|
for (r = 0; r < mNumCols; ++r)
|
|
{
|
|
vMatrix[current + mNumCols * r] =
|
|
vMatrix[next + mNumCols * r];
|
|
}
|
|
current = next;
|
|
}
|
|
mVisited[current] = 1;
|
|
for (r = 0; r < mNumCols; ++r)
|
|
{
|
|
vMatrix[current + mNumCols * r] = mWVector[r];
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
// Compute a single column of U or V. The reflections and rotations
|
|
// are applied incrementally. This is useful when you want only a
|
|
// small number of the singular values or vectors.
|
|
void GetUColumn(int index, Real* uColumn) const
|
|
{
|
|
if (0 <= index && index < mNumRows)
|
|
{
|
|
// y = H*x, then x and y are swapped for the next H
|
|
Real* x = uColumn;
|
|
Real* y = &mWVector[0];
|
|
|
|
// Start with the Euclidean basis vector.
|
|
std::memset(x, 0, mNumRows * sizeof(Real));
|
|
if (index < mNumCols && mPermutation[0] >= 0)
|
|
{
|
|
// Sorting was requested.
|
|
x[mPermutation[index]] = (Real)1;
|
|
}
|
|
else
|
|
{
|
|
x[index] = (Real)1;
|
|
}
|
|
|
|
// Apply the Givens rotations.
|
|
for (auto const& givens : gte::reverse(mLGivens))
|
|
{
|
|
Real& xr0 = x[givens.index0];
|
|
Real& xr1 = x[givens.index1];
|
|
Real tmp0 = givens.cs * xr0 + givens.sn * xr1;
|
|
Real tmp1 = -givens.sn * xr0 + givens.cs * xr1;
|
|
xr0 = tmp0;
|
|
xr1 = tmp1;
|
|
}
|
|
|
|
// Apply the Householder reflections.
|
|
for (int c = mNumCols - 1; c >= 0; --c)
|
|
{
|
|
// Get the Householder vector u.
|
|
int r;
|
|
for (r = 0; r < c; ++r)
|
|
{
|
|
y[r] = x[r];
|
|
}
|
|
|
|
// Compute s = Dot(x,u) * 2/u^T*u.
|
|
Real s = x[r]; // r = c, u[r] = 1
|
|
for (int j = r + 1; j < mNumRows; ++j)
|
|
{
|
|
s += x[j] * mMatrix[c + mNumCols * j];
|
|
}
|
|
s *= mTwoInvUTU[c];
|
|
|
|
// r = c, y[r] = x[r]*u[r] - s = x[r] - s because u[r] = 1
|
|
y[r] = x[r] - s;
|
|
|
|
// Compute the remaining components of y.
|
|
for (++r; r < mNumRows; ++r)
|
|
{
|
|
y[r] = x[r] - s * mMatrix[c + mNumCols * r];
|
|
}
|
|
|
|
std::swap(x, y);
|
|
}
|
|
|
|
// The final product is stored in x.
|
|
if (x != uColumn)
|
|
{
|
|
size_t numBytes = mNumRows * sizeof(Real);
|
|
std::memcpy(uColumn, x, numBytes);
|
|
}
|
|
}
|
|
}
|
|
|
|
void GetVColumn(int index, Real* vColumn) const
|
|
{
|
|
if (0 <= index && index < mNumCols)
|
|
{
|
|
// y = H*x, then x and y are swapped for the next H
|
|
Real* x = vColumn;
|
|
Real* y = &mWVector[0];
|
|
|
|
// Start with the Euclidean basis vector.
|
|
std::memset(x, 0, mNumCols * sizeof(Real));
|
|
if (mPermutation[0] >= 0)
|
|
{
|
|
// Sorting was requested.
|
|
int p = mPermutation[index];
|
|
x[p] = mFixupDiagonal[p];
|
|
}
|
|
else
|
|
{
|
|
x[index] = mFixupDiagonal[index];
|
|
}
|
|
|
|
// Apply the Givens rotations.
|
|
for (auto const& givens : gte::reverse(mRGivens))
|
|
{
|
|
Real& xr0 = x[givens.index0];
|
|
Real& xr1 = x[givens.index1];
|
|
Real tmp0 = givens.cs * xr0 + givens.sn * xr1;
|
|
Real tmp1 = -givens.sn * xr0 + givens.cs * xr1;
|
|
xr0 = tmp0;
|
|
xr1 = tmp1;
|
|
}
|
|
|
|
// Apply the Householder reflections.
|
|
for (int r = mNumCols - 3; r >= 0; --r)
|
|
{
|
|
// Get the Householder vector v.
|
|
int c;
|
|
for (c = 0; c < r + 1; ++c)
|
|
{
|
|
y[c] = x[c];
|
|
}
|
|
|
|
// Compute s = Dot(x,v) * 2/v^T*v.
|
|
Real s = x[c]; // c = r+1, v[c] = 1
|
|
for (int j = c + 1; j < mNumCols; ++j)
|
|
{
|
|
s += x[j] * mMatrix[j + mNumCols * r];
|
|
}
|
|
s *= mTwoInvVTV[r];
|
|
|
|
// c = r+1, y[c] = x[c]*v[c] - s = x[c] - s
|
|
// because v[c] = 1
|
|
y[c] = x[c] - s;
|
|
|
|
// Compute the remaining components of y.
|
|
for (++c; c < mNumCols; ++c)
|
|
{
|
|
y[c] = x[c] - s * mMatrix[c + mNumCols * r];
|
|
}
|
|
|
|
std::swap(x, y);
|
|
}
|
|
|
|
// The final product is stored in x.
|
|
if (x != vColumn)
|
|
{
|
|
size_t numBytes = mNumCols * sizeof(Real);
|
|
std::memcpy(vColumn, x, numBytes);
|
|
}
|
|
}
|
|
}
|
|
|
|
Real GetSingularValue(int index) const
|
|
{
|
|
if (0 <= index && index < mNumCols)
|
|
{
|
|
if (mPermutation[0] >= 0)
|
|
{
|
|
// Sorting was requested.
|
|
return mDiagonal[mPermutation[index]];
|
|
}
|
|
else
|
|
{
|
|
// Sorting was not requested.
|
|
return mDiagonal[index];
|
|
}
|
|
}
|
|
else
|
|
{
|
|
return (Real)0;
|
|
}
|
|
}
|
|
|
|
private:
|
|
// Bidiagonalize using Householder reflections. On input, mMatrix is
|
|
// a copy of the input matrix and has one extra row. On output, the
|
|
// diagonal and superdiagonal contain the bidiagonalized results. The
|
|
// lower-triangular portion stores the essential parts of the
|
|
// Householder u vectors (the elements of u after the leading 1-valued
|
|
// component) and the upper-triangular portion stores the essential
|
|
// parts of the Householder v vectors. To avoid recomputing
|
|
// 2/Dot(u,u) and 2/Dot(v,v), these quantities are stored in
|
|
// mTwoInvUTU and mTwoInvVTV.
|
|
void Bidiagonalize()
|
|
{
|
|
int r, c;
|
|
for (int i = 0, ip1 = 1; i < mNumCols; ++i, ++ip1)
|
|
{
|
|
// Compute the U-Householder vector.
|
|
Real length = (Real)0;
|
|
for (r = i; r < mNumRows; ++r)
|
|
{
|
|
Real ur = mMatrix[i + mNumCols * r];
|
|
mUVector[r] = ur;
|
|
length += ur * ur;
|
|
}
|
|
Real udu = (Real)1;
|
|
length = std::sqrt(length);
|
|
if (length > (Real)0)
|
|
{
|
|
Real& u1 = mUVector[i];
|
|
Real sgn = (u1 >= (Real)0 ? (Real)1 : (Real)-1);
|
|
Real invDenom = (Real)1 / (u1 + sgn * length);
|
|
u1 = (Real)1;
|
|
for (r = ip1; r < mNumRows; ++r)
|
|
{
|
|
Real& ur = mUVector[r];
|
|
ur *= invDenom;
|
|
udu += ur * ur;
|
|
}
|
|
}
|
|
|
|
// Compute the rank-1 offset u*w^T.
|
|
Real invudu = (Real)1 / udu;
|
|
Real twoinvudu = invudu * (Real)2;
|
|
for (c = i; c < mNumCols; ++c)
|
|
{
|
|
mWVector[c] = (Real)0;
|
|
for (r = i; r < mNumRows; ++r)
|
|
{
|
|
mWVector[c] += mMatrix[c + mNumCols * r] * mUVector[r];
|
|
}
|
|
mWVector[c] *= twoinvudu;
|
|
}
|
|
|
|
// Update the input matrix.
|
|
for (r = i; r < mNumRows; ++r)
|
|
{
|
|
for (c = i; c < mNumCols; ++c)
|
|
{
|
|
mMatrix[c + mNumCols * r] -= mUVector[r] * mWVector[c];
|
|
}
|
|
}
|
|
|
|
if (i < mNumCols - 2)
|
|
{
|
|
// Compute the V-Householder vectors.
|
|
length = (Real)0;
|
|
for (c = ip1; c < mNumCols; ++c)
|
|
{
|
|
Real vc = mMatrix[c + mNumCols * i];
|
|
mVVector[c] = vc;
|
|
length += vc * vc;
|
|
}
|
|
Real vdv = (Real)1;
|
|
length = std::sqrt(length);
|
|
if (length > (Real)0)
|
|
{
|
|
Real& v1 = mVVector[ip1];
|
|
Real sgn = (v1 >= (Real)0 ? (Real)1 : (Real)-1);
|
|
Real invDenom = (Real)1 / (v1 + sgn * length);
|
|
v1 = (Real)1;
|
|
for (c = ip1 + 1; c < mNumCols; ++c)
|
|
{
|
|
Real& vc = mVVector[c];
|
|
vc *= invDenom;
|
|
vdv += vc * vc;
|
|
}
|
|
}
|
|
|
|
// Compute the rank-1 offset w*v^T.
|
|
Real invvdv = (Real)1 / vdv;
|
|
Real twoinvvdv = invvdv * (Real)2;
|
|
for (r = i; r < mNumRows; ++r)
|
|
{
|
|
mWVector[r] = (Real)0;
|
|
for (c = ip1; c < mNumCols; ++c)
|
|
{
|
|
mWVector[r] += mMatrix[c + mNumCols * r] * mVVector[c];
|
|
}
|
|
mWVector[r] *= twoinvvdv;
|
|
}
|
|
|
|
// Update the input matrix.
|
|
for (r = i; r < mNumRows; ++r)
|
|
{
|
|
for (c = ip1; c < mNumCols; ++c)
|
|
{
|
|
mMatrix[c + mNumCols * r] -= mWVector[r] * mVVector[c];
|
|
}
|
|
}
|
|
|
|
mTwoInvVTV[i] = twoinvvdv;
|
|
for (c = i + 2; c < mNumCols; ++c)
|
|
{
|
|
mMatrix[c + mNumCols * i] = mVVector[c];
|
|
}
|
|
}
|
|
|
|
mTwoInvUTU[i] = twoinvudu;
|
|
for (r = ip1; r < mNumRows; ++r)
|
|
{
|
|
mMatrix[i + mNumCols * r] = mUVector[r];
|
|
}
|
|
}
|
|
|
|
// Copy the diagonal and subdiagonal for cache coherence in the
|
|
// Golub-Kahan iterations.
|
|
int k, ksup = mNumCols - 1, index = 0, delta = mNumCols + 1;
|
|
for (k = 0; k < ksup; ++k, index += delta)
|
|
{
|
|
mDiagonal[k] = mMatrix[index];
|
|
mSuperdiagonal[k] = mMatrix[index + 1];
|
|
}
|
|
mDiagonal[k] = mMatrix[index];
|
|
}
|
|
|
|
// A helper for generating Givens rotation sine and cosine robustly.
|
|
void GetSinCos(Real x, Real y, Real& cs, Real& sn)
|
|
{
|
|
// Solves sn*x + cs*y = 0 robustly.
|
|
Real tau;
|
|
if (y != (Real)0)
|
|
{
|
|
if (std::fabs(y) > std::fabs(x))
|
|
{
|
|
tau = -x / y;
|
|
sn = (Real)1 / std::sqrt((Real)1 + tau * tau);
|
|
cs = sn * tau;
|
|
}
|
|
else
|
|
{
|
|
tau = -y / x;
|
|
cs = (Real)1 / std::sqrt((Real)1 + tau * tau);
|
|
sn = cs * tau;
|
|
}
|
|
}
|
|
else
|
|
{
|
|
cs = (Real)1;
|
|
sn = (Real)0;
|
|
}
|
|
}
|
|
|
|
// Test for (effectively) zero-valued diagonal entries through all
|
|
// but the last. For each such entry, the B matrix decouples. Perform
|
|
// that decoupling. If there are no zero-valued entries, then the
|
|
// Golub-Kahan step must be performed.
|
|
bool DiagonalEntriesNonzero(int imin, int imax, Real threshold)
|
|
{
|
|
for (int i = imin; i <= imax; ++i)
|
|
{
|
|
if (std::fabs(mDiagonal[i]) <= threshold)
|
|
{
|
|
// Use planar rotations to case the superdiagonal entry
|
|
// out of the matrix, thus producing a row of zeros.
|
|
Real x, z, cs, sn;
|
|
Real y = mSuperdiagonal[i];
|
|
mSuperdiagonal[i] = (Real)0;
|
|
for (int j = i + 1; j <= imax + 1; ++j)
|
|
{
|
|
x = mDiagonal[j];
|
|
GetSinCos(x, y, cs, sn);
|
|
mLGivens.push_back(GivensRotation(i, j, cs, sn));
|
|
mDiagonal[j] = cs * x - sn * y;
|
|
if (j <= imax)
|
|
{
|
|
z = mSuperdiagonal[j];
|
|
mSuperdiagonal[j] = cs * z;
|
|
y = sn * z;
|
|
}
|
|
}
|
|
return false;
|
|
}
|
|
}
|
|
return true;
|
|
}
|
|
|
|
// This is Algorithm 8.3.1 in "Matrix Computations, 2nd edition" by
|
|
// G. H. Golub and C. F. Van Loan.
|
|
void DoGolubKahanStep(int imin, int imax)
|
|
{
|
|
// The implicit shift. Compute the eigenvalue u of the
|
|
// lower-right 2x2 block of A = B^T*B that is closer to b11.
|
|
Real f0 = (imax >= (Real)1 ? mSuperdiagonal[imax - 1] : (Real)0);
|
|
Real d1 = mDiagonal[imax];
|
|
Real f1 = mSuperdiagonal[imax];
|
|
Real d2 = mDiagonal[imax + 1];
|
|
Real a00 = d1 * d1 + f0 * f0;
|
|
Real a01 = d1 * f1;
|
|
Real a11 = d2 * d2 + f1 * f1;
|
|
Real dif = (a00 - a11) * (Real)0.5;
|
|
Real sgn = (dif >= (Real)0 ? (Real)1 : (Real)-1);
|
|
Real a01sqr = a01 * a01;
|
|
Real u = a11 - a01sqr / (dif + sgn * std::sqrt(dif * dif + a01sqr));
|
|
Real x = mDiagonal[imin] * mDiagonal[imin] - u;
|
|
Real y = mDiagonal[imin] * mSuperdiagonal[imin];
|
|
|
|
Real a12, a21, a22, a23, cs, sn;
|
|
Real a02 = (Real)0;
|
|
int i0 = imin - 1, i1 = imin, i2 = imin + 1;
|
|
for (/**/; i1 <= imax; ++i0, ++i1, ++i2)
|
|
{
|
|
// Compute the Givens rotation G and save it for use in
|
|
// computing V in U^T*A*V = S.
|
|
GetSinCos(x, y, cs, sn);
|
|
mRGivens.push_back(GivensRotation(i1, i2, cs, sn));
|
|
|
|
// Update B0 = B*G.
|
|
if (i1 > imin)
|
|
{
|
|
mSuperdiagonal[i0] = cs * mSuperdiagonal[i0] - sn * a02;
|
|
}
|
|
|
|
a11 = mDiagonal[i1];
|
|
a12 = mSuperdiagonal[i1];
|
|
a22 = mDiagonal[i2];
|
|
mDiagonal[i1] = cs * a11 - sn * a12;
|
|
mSuperdiagonal[i1] = sn * a11 + cs * a12;
|
|
mDiagonal[i2] = cs * a22;
|
|
a21 = -sn * a22;
|
|
|
|
// Update the parameters for the next Givens rotations.
|
|
x = mDiagonal[i1];
|
|
y = a21;
|
|
|
|
// Compute the Givens rotation G and save it for use in
|
|
// computing U in U^T*A*V = S.
|
|
GetSinCos(x, y, cs, sn);
|
|
mLGivens.push_back(GivensRotation(i1, i2, cs, sn));
|
|
|
|
// Update B1 = G^T*B0.
|
|
a11 = mDiagonal[i1];
|
|
a12 = mSuperdiagonal[i1];
|
|
a22 = mDiagonal[i2];
|
|
mDiagonal[i1] = cs * a11 - sn * a21;
|
|
mSuperdiagonal[i1] = cs * a12 - sn * a22;
|
|
mDiagonal[i2] = sn * a12 + cs * a22;
|
|
|
|
if (i1 < imax)
|
|
{
|
|
a23 = mSuperdiagonal[i2];
|
|
a02 = -sn * a23;
|
|
mSuperdiagonal[i2] = cs * a23;
|
|
|
|
// Update the parameters for the next Givens rotations.
|
|
x = mSuperdiagonal[i1];
|
|
y = a02;
|
|
}
|
|
}
|
|
}
|
|
|
|
// The diagonal entries are not guaranteed to be nonnegative during
|
|
// the construction. After convergence to a diagonal matrix S, test
|
|
// for negative entries and build a diagonal matrix that reverses the
|
|
// sign on the S-entry.
|
|
void EnsureNonnegativeDiagonal()
|
|
{
|
|
for (int i = 0; i < mNumCols; ++i)
|
|
{
|
|
if (mDiagonal[i] >= (Real)0)
|
|
{
|
|
mFixupDiagonal[i] = (Real)1;
|
|
}
|
|
else
|
|
{
|
|
mDiagonal[i] = -mDiagonal[i];
|
|
mFixupDiagonal[i] = (Real)-1;
|
|
}
|
|
}
|
|
}
|
|
|
|
// Sort the singular values and compute the corresponding permutation
|
|
// of the indices of the array storing the singular values. The
|
|
// permutation is used for reordering the singular values and the
|
|
// corresponding columns of the orthogonal matrix in the calls to
|
|
// GetSingularValues(...) and GetOrthogonalMatrices(...).
|
|
void ComputePermutation(int sortType)
|
|
{
|
|
if (sortType == 0)
|
|
{
|
|
// Set a flag for GetSingularValues() and
|
|
// GetOrthogonalMatrices() to know that sorted output was not
|
|
// requested.
|
|
mPermutation[0] = -1;
|
|
return;
|
|
}
|
|
|
|
// Compute the permutation induced by sorting. Initially, we
|
|
// start with the identity permutation I = (0,1,...,N-1).
|
|
struct SortItem
|
|
{
|
|
Real singularValue;
|
|
int index;
|
|
};
|
|
|
|
std::vector<SortItem> items(mNumCols);
|
|
int i;
|
|
for (i = 0; i < mNumCols; ++i)
|
|
{
|
|
items[i].singularValue = mDiagonal[i];
|
|
items[i].index = i;
|
|
}
|
|
|
|
if (sortType > 0)
|
|
{
|
|
std::sort(items.begin(), items.end(),
|
|
[](SortItem const& item0, SortItem const& item1)
|
|
{
|
|
return item0.singularValue < item1.singularValue;
|
|
}
|
|
);
|
|
}
|
|
else
|
|
{
|
|
std::sort(items.begin(), items.end(),
|
|
[](SortItem const& item0, SortItem const& item1)
|
|
{
|
|
return item0.singularValue > item1.singularValue;
|
|
}
|
|
);
|
|
}
|
|
|
|
i = 0;
|
|
for (auto const& item : items)
|
|
{
|
|
mPermutation[i++] = item.index;
|
|
}
|
|
|
|
// GetOrthogonalMatrices() has nontrivial code for computing the
|
|
// orthogonal U and V from the reflections and rotations. To
|
|
// avoid complicating the code further when sorting is requested,
|
|
// U and V are computed as in the unsorted case. We then need to
|
|
// swap columns of U and V to be consistent with the sorting of
|
|
// the singular values. To minimize copying due to column swaps,
|
|
// we use permutation P. The minimum number of transpositions to
|
|
// obtain P from I is N minus the number of cycles of P. Each
|
|
// cycle is reordered with a minimum number of transpositions;
|
|
// that is, the singular items are cyclically swapped, leading to
|
|
// a minimum amount of copying. For example, if there is a
|
|
// cycle i0 -> i1 -> i2 -> i3, then the copying is
|
|
// save = singularitem[i0];
|
|
// singularitem[i1] = singularitem[i2];
|
|
// singularitem[i2] = singularitem[i3];
|
|
// singularitem[i3] = save;
|
|
}
|
|
|
|
// The number rows and columns of the matrices to be processed.
|
|
int mNumRows, mNumCols;
|
|
|
|
// The maximum number of iterations for reducing the bidiagonal matrix
|
|
// to a diagonal matrix.
|
|
unsigned int mMaxIterations;
|
|
|
|
// The internal copy of a matrix passed to the solver. See the
|
|
// comments about function Bidiagonalize() about what is stored in the
|
|
// matrix.
|
|
std::vector<Real> mMatrix; // MxN elements
|
|
|
|
// After the initial bidiagonalization by Householder reflections, we
|
|
// no longer need the full mMatrix. Copy the diagonal and
|
|
// superdiagonal entries to linear arrays in order to be cache
|
|
// friendly.
|
|
std::vector<Real> mDiagonal; // N elements
|
|
std::vector<Real> mSuperdiagonal; // N-1 elements
|
|
|
|
// The Givens rotations used to reduce the initial bidiagonal matrix
|
|
// to a diagonal matrix. A rotation is the identity with the following
|
|
// replacement entries: R(index0,index0) = cs, R(index0,index1) = sn,
|
|
// R(index1,index0) = -sn, and R(index1,index1) = cs. If N is the
|
|
// number of matrix columns and K is the maximum number of iterations,
|
|
// the maximum number of right or left Givens rotations is K*(N-1).
|
|
// The maximum amount of memory is allocated to store these. However,
|
|
// we also potentially need left rotations to decouple the matrix when
|
|
// diagonal terms are zero. Worst case is a number of matrices
|
|
// quadratic in N, so for now we just use std::vector<Rotation> whose
|
|
// initial capacity is K*(N-1).
|
|
struct GivensRotation
|
|
{
|
|
// No default initialization for fast creation of std::vector of
|
|
// objects of this type.
|
|
GivensRotation() = default;
|
|
|
|
GivensRotation(int inIndex0, int inIndex1, Real inCs, Real inSn)
|
|
:
|
|
index0(inIndex0),
|
|
index1(inIndex1),
|
|
cs(inCs),
|
|
sn(inSn)
|
|
{
|
|
}
|
|
|
|
int index0, index1;
|
|
Real cs, sn;
|
|
};
|
|
|
|
std::vector<GivensRotation> mRGivens;
|
|
std::vector<GivensRotation> mLGivens;
|
|
|
|
// The diagonal matrix that is used to convert S-entries to
|
|
// nonnegative.
|
|
std::vector<Real> mFixupDiagonal; // N elements
|
|
|
|
// When sorting is requested, the permutation associated with the sort
|
|
// is stored in mPermutation. When sorting is not requested,
|
|
// mPermutation[0] is set to -1. mVisited is used for finding cycles
|
|
// in the permutation.
|
|
std::vector<int> mPermutation; // N elements
|
|
mutable std::vector<int> mVisited; // N elements
|
|
|
|
// Temporary storage to compute Householder reflections and to support
|
|
// sorting of columns of the orthogonal matrices.
|
|
std::vector<Real> mTwoInvUTU; // N elements
|
|
std::vector<Real> mTwoInvVTV; // N-2 elements
|
|
mutable std::vector<Real> mUVector; // M elements
|
|
mutable std::vector<Real> mVVector; // N elements
|
|
mutable std::vector<Real> mWVector; // max(M,N) elements
|
|
};
|
|
}
|
|
|