% mtimesx does a matrix multiply of two inputs (single, double, or sparse) %****************************************************************************** % % MATLAB (R) is a trademark of The Mathworks (R) Corporation % % Function: mtimesx % Filename: mtimesx.m % Programmer: James Tursa % Version: 1.10 % Date: December 08, 2009 % Copyright: (c) 2009 by James Tursa, All Rights Reserved % % This code uses the BSD License: % % Redistribution and use in source and binary forms, with or without % modification, are permitted provided that the following conditions are % met: % % * Redistributions of source code must retain the above copyright % notice, this list of conditions and the following disclaimer. % * Redistributions in binary form must reproduce the above copyright % notice, this list of conditions and the following disclaimer in % the documentation and/or other materials provided with the distribution % % THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" % AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE % IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE % ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE % LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR % CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF % SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS % INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN % CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) % ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE % POSSIBILITY OF SUCH DAMAGE. % %-- % % mtimesx is a fast general purpose matrix and scalar multiply routine that utilizes % BLAS calls and custom code to perform the calculations. mtimesx also has extended % support for n-Dimensional (nD, n > 2) arrays, treating these as arrays of 2D matrices % for the purposes of matrix operations. % % "Doesn't MATLAB already do this?" For 2D matrices, yes, it does. However, MATLAB does % not always implement the most efficent algorithms for memory access, and MATLAB does not % always take full advantage of symmetric cases. The mtimesx 'SPEED' mode attempts to do % both of these to the fullest extent possible. For nD matrices, MATLAB does not have % direct support for this. One is forced to write loops to accomplish the same thing that % mtimesx can do faster. % % The usage is as follows (arguments in brackets [ ] are optional): % % Syntax % % M = mtimesx( [mode] ) % C = mtimesx(A [,transa] ,B [,transb] [,mode]) % % Description % % mtimesx performs the matrix calculation op(A) * op(B), where: % A = A single or double or sparse scalar, matrix, or array. % B = A single or double or sparse scalar, matrix, or array. % transa = A character indicating a pre-operation on A: % transb = A character indicating a pre-operation on B: % The pre-operation can be any of: % 'N' or 'n' = No operation (the default if trans_ is missing) % 'T' or 't' = Transpose % 'C' or 'c' = Conjugate Transpose % 'G' or 'g' = Conjugate (no transpose) % mode = 'MATLAB' or 'SPEED' (sets mode for current and future calculations, % case insensitive, optional) % M is a string indicating the current calculation mode, before setting the new one. % C is the result of the matrix multiply operation. % % Examples: % % C = mtimesx(A,B) % performs the calculation C = A * B % C = mtimesx(A,'T',B) % performs the calculation C = A.' * B % C = mtimesx(A,B,'G') % performs the calculation C = A * conj(B) % C = mtimesx(A,'C',B,'C') % performs the calculation C = A' * B' % % mtimesx has two modes: % % 'MATLAB' mode: This mode attempts to reproduce the MATLAB intrinsic function mtimes % results exactly. When there was a choice between faster code that did not match the % MATLAB intrinsic mtimes function results exactly vs slower code that did match the % MATLAB intrinsic mtimes function results exactly, the choice was made to use the % slower code. Speed improvements were only made in cases that did not cause a mismatch. % Caveat: I have only tested on a PC with later versions of MATLAB. But MATLAB may use % different algorithms for mtimes in earlier versions or on other machines that I was % unable to test, so even this mode may not match the MATLAB intrinsic mtimes function % exactly in some cases. This is the default mode when mtimesx is first loaded and % executed (i.e., the first time you use mtimesx in your MATLAB session and the first % time you use mtimesx after clearing it). You can set this mode for all future % calculations with the command mtimesx('MATLAB') (case insensitive). % % 'SPEED' mode: This mode attempts to reproduce the MATLAB intrinsic function mtimes % results closely, but not necessarily exactly. When there was a choice between faster % code that did not exactly match the MATLAB intrinsic mtimes function vs slower code % that did match the MATLAB intrinsic mtimes function, the choice was made to use the % faster code. Speed improvements were made in all cases that I could identify, even % if they caused a slight mismatch with the MATLAB intrinsic mtimes results. % NOTE: The mismatches are the results of doing calculations in a different order and % are not indicative of being less accurate. You can set this mode for all future % calculations with the command mtimesx('SPEED') (case insensitive). % % Note: You cannot combine double sparse and single inputs, since MATLAB does not % support a single sparse result. You also cannot combine sparse inputs with full % nD (n > 2) inputs, since MATLAB does not support a sparse nD result. The only % exception is a sparse scalar times an nD full array. In that special case, % mtimesx will treat the sparse scalar as a full scalar and return a full nD result. % % Note: The ‘N’, ‘T’, and ‘C’ have the same meanings as the direct inputs to the BLAS % routines. The ‘G’ input has no direct BLAS counterpart, but was relatively easy to % implement in mtimesx and saves time (as opposed to computing conj(A) or conj(B) % explicitly before calling mtimesx). % % mtimesx supports nD inputs. For these cases, the first two dimensions specify the % matrix multiply involved. The remaining dimensions are duplicated and specify the % number of individual matrix multiplies to perform for the result. i.e., mtimesx % treats these cases as arrays of 2D matrices and performs the operation on the % associated parings. For example: % % If A is (2,3,4,5) and B is (3,6,4,5), then % mtimesx(A,B) would result in C(2,6,4,5) % where C(:,:,i,j) = A(:,:,i,j) * B(:,:,i,j), i=1:4, j=1:5 % % which would be equivalent to the MATLAB m-code: % C = zeros(2,6,4,5); % for m=1:4 % for n=1:5 % C(:,:,m,n) = A(:,:,m,n) * B(:,:,m,n); % end % end % % The first two dimensions must conform using the standard matrix multiply rules % taking the transa and transb pre-operations into account, and dimensions 3:end % must match exactly or be singleton (equal to 1). If a dimension is singleton % then it is virtually expanded to the required size (i.e., equivalent to a % repmat operation to get it to a conforming size but without the actual data % copy). For example: % % If A is (2,3,4,5) and B is (3,6,1,5), then % mtimesx(A,B) would result in C(2,6,4,5) % where C(:,:,i,j) = A(:,:,i,j) * B(:,:,1,j), i=1:4, j=1:5 % % which would be equivalent to the MATLAB m-code: % C = zeros(2,6,4,5); % for m=1:4 % for n=1:5 % C(:,:,m,n) = A(:,:,m,n) * B(:,:,1,n); % end % end % % When a transpose (or conjugate transpose) is involved, the first two dimensions % are transposed in the multiply as you would expect. For example: % % If A is (3,2,4,5) and B is (3,6,4,5), then % mtimesx(A,'C',B,'G') would result in C(2,6,4,5) % where C(:,:,i,j) = A(:,:,i,j)' * conj( B(:,:,i,j) ), i=1:4, j=1:5 % % which would be equivalent to the MATLAB m-code: % C = zeros(2,6,4,5); % for m=1:4 % for n=1:5 % C(:,:,m,n) = A(:,:,m,n)' * conj( B(:,:,m,n) ); % end % end % % If A is a scalar (1,1) and B is (3,6,4,5), then % mtimesx(A,'G',B,'C') would result in C(6,3,4,5) % where C(:,:,i,j) = conj(A) * B(:,:,i,j)', i=1:4, j=1:5 % % which would be equivalent to the MATLAB m-code: % C = zeros(6,3,4,5); % for m=1:4 % for n=1:5 % C(:,:,m,n) = conj(A) * B(:,:,m,n)'; % end % end % % --------------------------------------------------------------------------------------------------------------------------------- % % The BLAS routines used are DDOT, DGEMV, DGEMM, DSYRK, and DSYR2K for double % variables, and SDOT, SGEMV, SGEMM, SSYRK, and SSYR2K for single variables. % These routines are (description taken from www.netlib.org): % % DDOT and SDOT: % % * forms the dot product of two vectors. % % DGEMV and SGEMV: % % * DGEMV performs one of the matrix-vector operations % * % * y := alpha*A*x + beta*y, or y := alpha*A'*x + beta*y, % * % * where alpha and beta are scalars, x and y are vectors and A is an % * m by n matrix. % % DGEMM and SGEMM: % % * DGEMM performs one of the matrix-matrix operations % * % * C := alpha*op( A )*op( B ) + beta*C, % * % * where op( X ) is one of % * % * op( X ) = X or op( X ) = X', % * % * alpha and beta are scalars, and A, B and C are matrices, with op( A ) % * an m by k matrix, op( B ) a k by n matrix and C an m by n matrix. % % DSYRK and SSYRK: % % * DSYRK performs one of the symmetric rank k operations % * % * C := alpha*A*A' + beta*C, % * % * or % * % * C := alpha*A'*A + beta*C, % * % * where alpha and beta are scalars, C is an n by n symmetric matrix % * and A is an n by k matrix in the first case and a k by n matrix % * in the second case. % % DSYR2K and SSYR2K: % % * DSYR2K performs one of the symmetric rank 2k operations % * % * C := alpha*A*B' + alpha*B*A' + beta*C, % * % * or % * % * C := alpha*A'*B + alpha*B'*A + beta*C, % * % * where alpha and beta are scalars, C is an n by n symmetric matrix % * and A and B are n by k matrices in the first case and k by n % * matrices in the second case. % % Double sparse matrix operations are supported, but not always directly. % For (matrix) * (scalar) operations, custom code is used to produce a result % that minimizes memory access times. All other operations, such as % (matrix) * (vector) or (matrix) * (matrix), or any operation involving a transpose % or conjugate transpose, are obtained with calls back to the MATLAB intrinsic % mtimes function. Thus for most non-scalar sparse operations, mtimesx is % simply a thin wrapper around the intrinsic MATLAB function and you will see % no speed improvement. % % --------------------------------------------------------------------------------------------------------------------------------- % % Examples: % % C = mtimesx(A,B) % performs the calculation C = A * B % C = mtimesx(A,'T',B,'speed') % performs the calculation C = A.' * B % % using a fast algorithm that may not % % match MATLAB results exactly % mtimesx('matlab') % sets calculation mode to match MATLAB % C = mtimesx(A,B,'g') % performs the calculation C = A * conj(B) % C = mtimesx(A,'c',B,'C') % performs the calculation C = A' * B' % % --------------------------------------------------------------------------------------------------------------------------------- function varargout = mtimesx(varargin) %\ % If you got here then mtimesx is not compiled yet, so go compile it first. %/ mtimesx_build; %\ % Call the mex routine mtimesx. %/ [varargout{1:nargout}] = mtimesx(varargin{:}); end