A parallel eigenvalue algorithm for sparse matrices

July 24, 2017 | Autor: Jose Luis Ramirez | Categoría: Parallel Algorithms, Numerical Linear Algebra, OpenMP, Eigenvalues, Sparse Respresentation

Share Embed

Laporkan tautan ini

Descripción

A PARALLEL EIGENVALUES ALGORITHM FOR SPARSE MATRICES Jos´ e L. Ram´ırez Germ´ an A. Larraz´ abal Multidisciplinary Center of Scientific Visualization and Computing (CEMVICC), Faculty of Sciences and Technology (FACYT), University of Carabobo (UC), Av. Montes de Oca, No. 120-267, Edf. FACYT C.P. 2001, Phone-Fax: (+58 241) 8678243, Valencia–Venezuela e–mail: {jbarrios, glarraza}@uc.edu.ve Abstract. In this work, we have implemented the classic algorithms to compute the eigenvalues of a n × n symmetric sparse matrix. Besides, the matrix size surpasses the thousands, which does necessary to use an efficient structure for the handling and storage of sparse matrices and to adapt classic numerical methods to this structure. The algorithms will allow to obtain all the eigenvalues and possible eigenvectors from the matrix. The Parallelism is used to reduce the execution times. Francis’s QR or QL algorithms use similarity transformations to convert the matrix in diagonal form. For this reason, the eigenvalues are preserved in each iteration. In addition, the QR or QL algorithms are a robust method for eigenvalues computing and its associated eigenvectors. Our study is carried out in modern supercomputers that execute many instructions by CPU cycle and them has many processors. We use the UCSparseLib library (University of Carabobo Sparse Library) as software platform. This library has the necessary and efficient structures for the handling and storage of sparse matrices. We want to improve the execution time of the serial algorithm applying multithreading using OpenMP library. Keywords: Eigenvalues, QR Algorithm, sparse storage format, OpenMP. 1.

INTRODUCTION

The standard problem for the calculation of eigenvalues is the determination of nontrivial solutions of the equation: Ax = λx,

(1)

where A is a n × n matrix. The value λ is an scalar known like eigenvalue or characteristic values of A, and x is the corresponding eigenvector or characteristic vector. The equation (1) is one of the basic problems of numerical linear algebra.

Compute the eigenvalues of a matrix, using the characteristic equation: det(A − λI) = 0,

(2)

it is not a good option, since the coefficients of the characteristic equation cannot be found with a good accurately of stable way, in terms of numerical analysis. This can be found in [1]. The calculation of (1) where the matrices have certain structures, is widely found in physical applications, such as theory of control and dynamic systems. Here the matrices can be, for example, symmetrical, non–symmetrical, positive defined or to have block structures. The key is then, to develop theories and numerical methods that they take into account the structure and the spectral properties from the problem. In our case, we are interested in those algorithms in which all the eigenvalues of the matrix are obtained in an efficient way. These algorithms are based on the theory of matrix diagonalization through similarity transformations. The following definition, given in [2], establishes when two matrices are similar. Definition: A square matrix A is similar to a square matrix B, if there is a nonsingular square matrix S such that A = S −1 BS. The matrix A is unitary similar to matrix B if the matrix S is an unitary matrix. In [2] is established that two similar matrices have the same eigenvalues. If B = S −1 AS, the A and B have the same eigenvalues with the same multiplicities. The main idea is apply numerical methods that allow to obtain the diagonal form of a matrix, using similarity transformations. The Jacobi method transform a symmetric matrix in a diagonal form applying a sequence of planar rotations, this method is analyzed in [3] [4]. In this work, we will take into account that the matrix has an sparse structure, thus it’s not advisable store all the matrix elements, only will be stored the significant elements, in this form the cost in main memory is avoided when the matrix has a low density. In addition, the main idea of this work is to show how the parallelism can be use to rescue a robust method for computing eigenvalues and eigenvectors such as QR method. Originally, this method was developed for dense format and smaller size matrices. 2.

STORAGE STRUCTURE

Although exist different forms to store a matrix, the main idea of the storage formats is to store and handling in a efficient way the no nulls elements of a matrix, they are stored in continuous manner using a linear array. The most known formats are Compressed Row Storage (CRS), Compressed Column Storage (CCS), Jagged Diagonal Storage (JDS) and the new Transpose Jagged Diagonal Storage (TJDS) [5]. In this work was used the UCSparseLib library, which is based on the CRS or CCS storage format. 2.1

Compressed Row Storage

In the CRS format, the no null elements are stored by row, for it are used three linear arrays, one to store the no nulls real elements and the other two of integer, one to store the column indexes of the array of no nulls, while the last array store the initial position of every row.

To show how the no nulls elements are stored under this scheme, let A a 5 × 5 matrix, with no null elements ai,j .   a1,1 0 0 0 a1,4  0 a2,2 a2,3 0 0     0 a a a 0 A= (3) 3,2 3,3 3,4    0  0 a4,3 a4,4 a4,5 0 0 0 a5,4 a5,5 Using the CRS format described previously, the no nulls elements are stored in the following arrays. values

a1,1

a1,4

a2,2

a2,3

column index

1

a3,2 4

2

3

start position 2.2

a3,3

1

a3,4

2 3

3 5

a4,3 4

8

a4,4

3

4

11

13

a4,5

5

4

a5,4

a5,5

5

Compressed Column Storage

The CCS storage format is similar to CRS format. This format has three arrays, one real array to store the no nulls elements and two integer arrays, one to store the row indexes of the array of no nulls, while the last array store the initial position of every column. The storage of the no null elements of the matrix given in (3) in CCS format can be shown in the following arrays. values

a1,1

a2,2

a3,2

row index

3.

a2,3 1

2

a3,3

a4,3

a3,4

a4,4

3

2

3

4

3

4

start position

1

2

4

7

10

5

a5,4 1

a1,4 4

a4,5

a5,5

5

13

QR METHOD

The QR method was developed by Kublanovskaya and Francis around 1961, [6]. This is an iterative method that was conceived to obtain all the eigenvalues of a matrix A using similarity transformations. As its name indicates it, this algorithm use de QR factorization, so the A matrix can be written as the product of two matrices, a Q matrix which is orthogonal and a R matrix upper triangular, in this way: A = QR.

(4)

Such factorization is carried out in different way as it is raised by [1], the most used manners is through the Givens Rotations and the Householder Transformations [7]. Nevertheless, computing the eigenvalues using the QR method, is more efficient when is applied upon tridiagonal matrix, which can be obtained through the Householder transformations, such is established in [7], this author raise that it is the fastest and exact manner to solve the complete eigenvalue problem. In [4] is realized a similarity

between QR and Schur through the following relations. Let Q1 = U1 , the matrix A1 can be calculated as: A1 = Q∗1 AQ1 = R1 Q1 ,

(5)

the next step would be to calculate AQ1 and factorize it as U2 R2 , but since A = Q1 R1 , we can note that AQ1 = Q1 R1 Q1 . If R1 Q1 is factorized as R1 Q1 = Q2 R2 we can note that: AQ1 = U2 R2 ,

(6)

where U2 = Q1 Q2 . Since Q2 is an orthogonal matrix, Q2 realize an orthogonal transformation that corrects Q1 to U2 . From the equation (5) and (6) we can note that: R1 Q1 = A1 = Q∗1 AQ1 = Q2 R2 ,

(7)

if the iteration is continued, now with the matrix A1 , the correction factor will be obtained for Q2 , in this way these correction factors will transform the matrix A1 into a Schur form, while the process is made iteratively. The QR algorithm is inefficient to be an attractive tool for the calculation of eigenvalues, would be necessary to refine it, of way to obtain one better exactitude in a smaller number of iterations, that is to say, to make the matrix diagonalization more effective. Strategy of Single Shift. As raises [8] a form is to take in each stage of the QR method, the value sk , like a real approach of an eigenvalue. Therefore, this technique is based upon the fact that a shift caused by an eigenvalue causes a deflation, therefore is suggested to select the shift sk like a approximate eigenvalue. A shift value can be selected as the element (k, k) of the matrix A(k) . In [9] [1], Stewart has demonstrated that this strategy produces quadratic convergence in the non-symmetrical case and cubical convergence in the symmetrical case. Nevertheless, in the symmetrical case, Wilkinson provided heuristic reasons for which an alternative shift can be selected, well-known like Wilkinson Shift, that can be one better option. The idea of Wilkinson is to select like shift value to the eigenvalue of the matrix of order 2 × 2 · ¸ ak−1,k−1 ak−1,k (k) A = (8) , ak,k−1 ak,k (k)

closest to the value Ak,k . Strategy of Implicit Shift. The QR algorithm with implicit shift, is a similar technique to the Wilkinson shift, computing the closest eigenvalue of matrix 2 × 2 to the element of the diagonal, this technique was presented by [10], and the difference is in carrying out the QR transformation without subtracting the shift of the elements of the diagonal, as it is carried out in the previous techniques. 3.1

Stop Criterion

The QR algorithm presents an iterative process to find the eigenvalues of an n × n matrix A, but in the process, is necessary to specify which is the criterion to follow, to indicate when an eigenvalue has been found. A criterion that is used in many of the

implementations of the QR algorithm, is proposed by Wilkinson, in which the element of the subdiagonal with the neighboring elements of the diagonal is compared. This way, in (k) the kth iteration, the element ai,i−1 are zeroed if it satisfied ³ ´ (k) (k) (k) |ai,i−1 | ≤ u |ai−1,i−1 | + |ai,i | . (9) In this work, the previous criterion has been used taking u as the round machine unit. 4.

NUMERICAL RESULTS

All the tests were developed in machines with architecture SUN microsystems , specifically in the server Sun Fire V440. This server has four processors UltraSPARC IIIi of 1.28 GHz, with main memory of 8 GB, using the operating system Solaris 10. The UCSparseLib library [11], where the different algorithms for eigenvalue computing were developed in ANSI C. Optimization flags used, which allow an optimal code, was the following ones: -fast, -xarch=v9 and -xopenmp, this last flag allows to compiler recognizes that exist directive of the OpenMP library [12] in the code. 4.1

Test Cases

The test matrices used in this work are originated of industrial and scientific problems given in Matrix Market 1 of similar way tests are made with a matrix generated by software CATIVIC [13] of the Venezuelan Institute of Scientific Researches (IVIC). The test matrices were taken from different areas and different applications to verify the generality of the algorithms. CATIVIC is a chemistry-quantum software specially developed to model reactions in the area of heterogenous catalysis. The principal characteristic of this matrices are shown int the table 1. Table 1. Test Matrices. Name bcsstm12 bcsstk13 bcsstm10 plat1919 mhd3200b matIVIC

4.2

Size 1473 × 1473 2003 × 2003 1086 × 1086 1919 × 1919 3200 × 3200 1095 × 1095

Nonzero 19659 83883 22092 32399 18316 332795

Density 0.0091 0.021 0.019 0.0088 0.0018 0.27

Implementation Details

Since the matrices are sparse, the UCSparseLib library, developed by [11], was used for the storage of such matrices to save space in memory and not to store all the matrix. The format of storage of the matrices is based on formats CRS/CCS. 1

MATRIX MARKET versi´on 3.0: http://math.nist.gov/MatrixMarket/

The method for eigenvalues computing par excellence is the method known like QR Method or QL Method, to assure the convergence the method the matrix should have a tridiagonal form. Thus, before applying QL, the tridiagonal matrix is obtained exploiting shared memory parallelism. To exploit this parallelism model we can use threads from openMP library. OpenMP is a model of parallel programming for multiprocessors machines of shared memory, which is based on a set of compiler directives. OpenMP supports a model of programming of shared variables, altogether with a run-time library and environment variables. A master thread executes the application, and when the parallel region is reached, a set of threads are generated, this is called the model fork-join, when it finalize all threads are recollect. 4.3

Discussion

The table 2 show the run times to obtain the eigenvalues and eigenvectors of a symmetrical matrix, process carried out in two phases. First the matrix is transformed into a tridiagonal matrix, using the method of Householder, and latter to apply on the main diagonal and the first subdiagonal the method QR or its equivalent QL, with some shift technique to improve the run time. Table 2. Tridiagonalization Time + QR. (seconds). Name bcsstm10 bcsstm12 bcsstk13 plat1919 mhd3200b matIVIC

Tridiagonalization 413.98 2040.78 9167.30 7434.93 58747.20 568.59

Eigenvalues 0.13 0.16 0.29 0.22 0.76 0.12

Tridiagonalization 751.40 3363.23 10914.88 8549.31 65858.88 730.74

Eigenvalues and Eigenvectors 181.57 389.31 973.49 730.19 4980.99 170.53

The process that consumes greater amount of CPU time is the transformation to matrix tridiagonal, as much to compute the eigenvalues of the matrix or the eigenvalues and eigenvectors of the matrix, the QL method is affected when has to compute the eigenvectors. The OpenMP standard is used following the model fork-join, reducing the time showed in table 2 and the results are shown in table 3 and table 4. In the same tables, is shown the acceleration obtained of the parallel algorithm with respect to the sequential algorithm, for each one of the cases. When threads are applied in those cycles where there is a greater consumption of processing time, those are reduced with the model of parallelism. In addition, we can note that the acceleration obtained for the calculation of eigenvalues in some cases is super–linear. Nevertheless, this acceleration is reduced when we computing the eigenvectors because in the tridiagonalization the transformations are storage in each step to assure that eigenvectors corresponds to the original matrix. These transformations cause nonuniform accesses in the data when we using compact formats of storage, which causes memory faults. For this reason, in each not uniform access exists an overhead.

Table 3. Sequential vs Parallel. (seconds). Eigenvalues Computing. Name Sequential Parallel Speed–up bcsstm10 414.12 121.57 3.41 bcsstm12 2040.95 428.24 4.76 bcsstk13 9167.60 1471.24 6.23 plat1919 7435.16 1209.99 6.14 mhd3200b 58747.97 14631.57 4.02 matIVIC 568.71 132.60 4.29 Table 4. Sequential vs Parallel. (seconds). Eigenvalues and Eigenvectors Computing. Name Sequential Parallel Speed–up bcsstm10 932.98 432.19 2.15 bcsstm12 3752.55 1059.63 3.54 bcsstk13 11888.38 3372.82 3.52 plat1919 9279.51 2750.89 3.37 mhd3200b 70839.87 21636.61 3.27 matIVIC 901.27 430.76 2.09

5.

CONCLUSIONS

In this work we have presented the main technique for the eigenvalues and eigenvectors computing, applying the parallel model of threads, known like fork–join model. The results were shown, making a comparison of the serials and parallel algorithms. The parallel algorithms uses threads with OpenMP for the tridiagonalization and the eigenvectors computing. In such results, we can observed that the execution time is reduced significantly. In the tables 3 and 4 is shown that the speed–up of the parallel algorithm is almost linear against the sequential algorithm and in some cases super–linear. With the parallel programming, we can reduce the overhead acquired by the use of compact formats of storage. In the section §4, is shown that the UCSparseLib library can be applied to many areas, and that with the aid of OpenMP library and making a study of the algorithms to apply, the run times can be reduced in great proportion, increasing the software and hardware performance applied to solve such problems. In addition, the tests with matrices of greater size has been generated by the matrix generator module from UCSparseLib. This module discretize a second order 3–D elliptic scalar operator using a finite difference method with a 7–stencil. Using UCSparseLib library these matrices could be stored, while those ones could not be stored using LAPACK library. One of these tests was with a matrix 8000 × 8000, where the eigenvalues computing was obtained at six days about.

Acknowledgements This work is granted by the University of Carabobo (UC), Valencia–Venezuela, through Consejo de Desarrollo Cient´ıfico y Human´ıstico (CDCH), project under number CDCH0350-2005. REFERENCES [1]. G. H. Golub and C. F. van Loan, Matrix Computations, Third Edition, The Johns Hopkins University Press, Baltimore (1996). [2]. G. Strang, Algebra Lineal y sus aplicaciones, Second Edition, Fondo Educativo Interamericano, (1982). [3]. G. H. Golub and H. A. van der Vorst, Eigenvalue computation in the 20th century, Journal of Computational and Applied Mathematics, 123: 35–65, (2000). [4]. H. A. van der Vorst, Computational Methods for Large Eigenvalue Problems, submitted for publication with Elsevier-North, Holland, (2000). [5]. E. Montagne. and A. Ekambaran, An Optimal Storage Format for Sparse Matrices, Information Processing Letters, North-Holland, Vol. 90(2), pp. 87-92, (2004). [6]. J. G. F. Francis, The QR Transformation, Parts I and II, The Computer Journal, Vol.4:265–271 and 332–345, (1961). [7]. J. H. Wilkinson, Householder’s Method for the Solution of the Algebraic Eigenproblem, The Computer Journal, Vol.3:23–27, (1960). [8]. C. D. Meyer, Matrix Analysis and Applied Linear Algebra, Society of Industrial and Applied Mathematics, (2000). [9]. F. Tisseur, Backward Stability of the QR Algorithm, Technical Report N 239, UMR 5585 Lyon Saint-Etienne, (1996). [10]. J. H. Wilkinson, C. Reinsch and H. Bowdler, The QR and QL Algorithms for Symmetric Matrices, Numerische Mathematik, Vol.11:293–306, (1968). [11]. G. Larraz´abal, UCSparseLib: Una biblioteca num´erica para resolver sistemas lineales dispersos, In Simulaci´ on Num´erica y Modelado Computacional, SVMNI, TC19–TC25, ISBN:980-6745-00-0, (2004). [12]. OpenMP Forum, OpenMP: A proposed Industry Standard API for Shared Memory Programming, Technical Report, www.openmp.org, (1997). [13]. F. Ruette, M. S´anchez, G. Martorell, C. Gonz´alez, R. A˜ nez, A. Sierraalta, L. Rinc´on and C. Mendoza, CATIVIC: Parametric quantum chemistry package for catalytic reactions: I, International Journal of Quantum Chemistry, Vol. 96: 321– 332, (2004).

Lihat lebih banyak...

A parallel eigenvalue algorithm for sparse matrices

Descripción

Comentarios