Parallel QR decomposition of a rectangular matrix

Share Embed


Descripción

Numer. Math. 48, 239-249 (1986)

Numerische Mathematik

9 Springer-Verlag1986

Parallel QR Decomposition of a Rectangular Matrix M. Cosnard, J.-M. Muller and Y. Robert CNRS, Laboratoire TIM3, Institut IMAG, BP 68, F-38402 Saint Martin d'Heres Cedex, France

Summary. We show that the greedy algorithm introduced in [1] and [5] to perform the parallel Q R decomposition of a dense rectangular matrix of size m x n is o p t i m a l Then we assume that m/n 2 tends to zero as m and n go to infinity, and prove that the complexity of such a decomposition is asymptotically 2n, when an unlimited number of processors is available. Subject Classifications: AMS(MOS): 65F05; CR: G1.3.

1. Introduction

Let A be a dense rectangular matrix of size m x n, m >n. The Givens transformations algorithm produces an orthogonal decomposition of A which is used in several problems of numerical analysis, including linear least squares problems. This algorithm requires m n - n ( n + l ) / 2 steps on a sequential machine, each step being the time necessary to achieve a Givens rotation. Parallel versions for S I M D or M I M D computers (Flynn [2]) of the Givens transformations algorithm have been introduced in the literature: see Lord et al. [4], Sameh and Kuck [6] and the survey papers of Heller [3] or Sameh [7]. Assuming that enough processors are available to perform at each step up to [m/2J independent rotations simultaneously (where [ . ] denotes the floor function), Sameh and Kuck [6] propose a scheme whose number of steps is m + n - 2 if m > n and 2 n - 3 if m = n . Recently, Modi and Clarke [5] have introduced and partially analyzed a greedy algorithm to compute the Q R decomposition of an m by n matrix. Their numerical experiments show that this new algorithm takes appreciably fewer stages than Sameh and Kuck's scheme, which is confirmed by an asymptotic theoretical analysis: Modi and Clarke obtain the approximation log 2 m + (n -- 1) log 2 log 2 m which is valid if m goes to infinity, n fixed. In their paper, Modi and Clarke introduce another class of algorithms, namely the Fibonacci schemes, and

240

M. Cosnard et al.

discuss in detail their performances. However, they observe that these schemes seem to be less efficient than the greedy algorithm. To our knowledge, no result has been published so far on the determination of an optimal algorithm for the Q R decomposition by Givens transformations of a rectangular matrix, and the evaluation of its performance. Moreover, all the algorithms presented by Sameh and Kuck [6] and Modi and Clarke [5] are Givens sequences, that is sequences of Givens rotations in which zeros once created are preserved. The question whether temporarily annihilating elements and introducing zeros that will be destroyed later on can lead to any additional parallelism has never been discussed. In this paper (Sect. 2) we prove that the greedy algorithm is always optimal, not only in the class of the Givens sequences, but more generally in the class of all possible parallel algorithms. This result is true for any rectangular matrix, and gives a negative answer to the Modi and Clarke's question concerning the existence of m by n matrices with a very large value of m and smallish n for which the Fibonacci schemes would be more efficient than the greedy algorithm. Furthermore, Modi and Clarke's approximate analysis can now be viewed as a result on the asymptotic complexity of the problem of computing the Q R decomposition by Givens transformations of an m by n matrix in the case m goes to infinity, n fixed. In Sect. 3, we derive a similar result for the case m =o(n2), m>n (recall that f(x)=o(x) means that f(x)/x tends to zero as x goes to infinity). More precisely, let Opt (m, n) denote the number of parallel steps in the greedy algorithm for an m by n matrix, and consider a function f(n) satisfying to the following properties: - l i m f ( n ) = 0 (as n goes to infinity) - f(n)> 1/n for all n. Then Opt(nZ f (n), n) = 2 n + o(n). In particular, this includes the case where m and n are proportional, which is of great interest for the linear least squares problem. We give experimental results to illustrate this case at the end of Sect. 3. Throughout the paper, A is an m by n rectangular matrix. We assume that Ira/2] Givens rotations can be performed simultaneously. We refer to Modi and Clarke [-5] for a straightforward modification of the algorithms if there are not enough processors, as well for discussing error bounds. Finally, we denote lim f(x) the limit of a function f as x goes to infinity.

2. An Optimal Algorithm In this section, we show that the greedy algorithm is optimal for decomposing any rectangular matrix A of size m x n, m > n. We first reduce the problem to a particular class of parallel algorithms, namely the Givens sequences. Moreover we prove that the elements can be annihilated from left to right and from b o t t o m to top: such an algorithm which reduces A to upper triangular form is called a Standard Parallel Givens Sequence. Then we introduce the Greedy Standard Parallel Givens Sequence and show that this algorithm is optimal.

Parallel QR Decomposition

241

Definitions and Notations R(i,j, k), i4:j, 1 < i , j < m and 1 m (n) - n. Clearly, q
Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.