LYCOS RETRIEVER Beta Retriever Home  |  What is Lycos Retriever?   
Multiplication Algorithm
built 635 days ago
A high-speed VLSI multiplication algorithm internally using redundant binary representation is proposed. In n bit binary integer multiplication, n partial products are first generated and then added up pairwise by means of a binary tree of redundant binary adders. Since parallel addition of two n-digit redundant binary numbers can be performed in a constant time independent of n without carry propagation, n bit multiplication can be performed in a time proportional to log2 n. The computation time is almost the same as that by a multiplier with a Wallace tree, in which three partial products will be converted into two, in contrast to our two-to-one conversion, and is much shorter than that by an array multiplier for longer operands. The number of computation elements of an n bit multiplier based on the algorithm is proportional to n2. It is almost the same as those of conventional ones. Furthermore, since the multiplier has a regular cellular array structure similar to an array multiplier, it is suitable for VLSI implementation.
Source:
A math books store at a unique math study site. Learn to enjoy mathematics. The multiplication algorithm [Wells, p. 44] discussed below is commonly known as the Russian Peasant Multiplication. It is even said that the algorithm "is still used by peasants in some areas, such as Russia." However, the source of the Russian Peasant designation is unexpectedly murky. It probably goes back to a few centuries old Russian book where the method has been first described in (relatively) modern times. I may only conjecture that the algorithm has acquired the Russian part of the designation in the process of translation from Russian and the Peasant part was appended due to a widely spread conviction that (at least in older times) it was mostly the peasant population that exclusively, albeit sparsely, filled the Russian vastness.
The new scheme for partitioning matrices across processors presented in conjunction with the 3D matrix multiplication algorithm is applicable to most of the level-3 BLAS. Gustavson has shown that 26 of the 30 level-3 BLAS can be expressed in terms of this 3D distribution. This work is still ongoing research.
Karatsuba multiplication is asymptotically an O(N^1.585) algorithm, the exponent being log(3)/log(2), representing 3 multiplies each 1/2 the size of the inputs. This is a big improvement over the basecase multiply at O(N^2) and the advantage soon overcomes the extra additions Karatsuba performs. MUL_KARATSUBA_THRESHOLD can be as little as 10 limbs. The SQR threshold is usually about twice the MUL.
Source:
Notes: Winograd's algorithm for fast matrix multiplication reduces the number of multiplications by a factor of two over the straightforward algorithm. It is implementable, although the additional bookkeeping required makes it doubtful whether it is a win.   Expositions on Winograd's algorithm [Win68] include [CLR90, Man89, Win80].
Source:
A parallel high-performance matrix multiplication P_GEMM algorithm based on a three-dimensional approach is presented. For the parallel case, the algorithm is a natural generalization of the serial _GEMM routine. _GEMM computes C = β C + op(A)op(B) where , are scalars, A, B, and C are matrices, and op(X) stands for X, X, or X. (Superior T indicates transpose, and superior C conjugate transpose.) The algorithm described has been implemented in both the double-precision and complex double-precision IEEE format, as well as for all combinations of matrix products involving matrices in their normal form, their transposed form, and their conjugates. For all of these data combinations, performance was the same.
SEARCH
MORE ABOUT
  Multiplication Algorithm