Parallel implicit predictor corrector methods

June 15, 2017 | Autor: Francesca Mazzia | Categoría: Applied Mathematics, Numerical Analysis and Computational Mathematics

Share Embed

Laporkan tautan ini

Descripción

Applied Numerical Mathematics 42 (2002) 235–250 www.elsevier.com/locate/apnum

Parallel implicit predictor corrector methods ✩ F. Iavernaro ∗ , F. Mazzia Dipartimento Interuniversitario di Matematica, Università di Bari, via E. Orabona 4, I-70125 Bari, Italy

Abstract The performance of parallel codes for the solution of initial value problems is usually strongly sensitive to the dimension of the continuous problem. This is due to the overhead related to the exchange of information among the processors and motivates the problem of minimizing the amount of communications. According to this principle, we define the so called Parallel Implicit Predictor Corrector Methods and in this class we derive A-stable, L-stable and numerically zero-stable formulas. The latter property refers to the zero-stability condition of a given formula when roundoff errors are introduced in its coefficients due to their representation in finite precision arithmetic. Some numerical experiment show the potentiality of this approach.  2001 IMACS. Published by Elsevier Science B.V. All rights reserved. Keywords: Initial value problems; Parallel block methods; Conditioning; Stability

1. Introduction In this paper we define a suitable subclass of block-Boundary Value Methods (block-BVMs) to compute the numerical solution of initial value problems of the form y (t) = f(t, y), t ∈ [t0 , t0 + T ], (1) y(t0 ) = y0 , on parallel computers. Denoting by m the dimension of (1) and setting H = Rm , we have that y0 ∈ H and f is a smooth function R × H → H. Concerning the general theory on BVMs we will refer to [1,4]. Just like Runge–Kutta (RK) methods, block-BVMs require the solution of an s stage nonlinear system to advance the solution in time. Here we are interested in investigating a parallelization across the method approach which consists of defining suitable algorithms that allow a uniform distribution of the global work to be carried out for solving the nonlinear system, along a given number of processors. This is for ✩

Work supported by cofin. 99 and GNCS.

* Corresponding author.

E-mail addresses: [email protected] (F. Iavernaro), [email protected] (F. Mazzia). 0168-9274/01/$22.00  2001 IMACS. Published by Elsevier Science B.V. All rights reserved. PII: S 0 1 6 8 - 9 2 7 4 ( 0 1 ) 0 0 1 5 3 - 2

236

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

example accomplished using the technique based on similarity transformations introduced by Butcher [2] (used in PGAM [5]), or using linear or nonlinear splitting schemes as in the code PSIDE [10]. The final purpose is often that of obtaining the solution of the original nonlinear system as the limit of a sequence of block-diagonal linear systems that may be solved in parallel. Whatever the strategy adopted, an exchange of information among the processors is needed after the solution of each linear system unless the original nonlinear discrete problem has a simplified structure. It is often observed that the global time spent in exchanging data among the processors before advancing the solution is not negligible with respect to the working time per step of each of the processors involved in the computation unless the dimension m of the continuous problem is large enough. This overhead may cause a severe degeneration of the overall performance of the parallel codes as often shown by a gap between the expected and real speedups (see, for example, [5]). It is our aim to investigate some suitable strategies that may allow an efficient parallelization also when the dimension of the continuous system is small and working on distributed memory machines. A further consideration that motivates such a study is the reasonable purpose of constructing a parallel code whose performance does not drastically depend on the features of the platform on which it is run. On the basis of the previous considerations, a reduction of communications among the processors seems to be of primary importance and will be a steady goal in the present work. The best that can be done from this point of view is represented by Parallel Block Methods (PBMs) [9] which require only one all to all communication per time step. Unfortunately A-stable PBMs of high order seem to suffer from an intrinsic ill-conditioning that causes a drawback in their convergence properties due to the use of finite precision arithmetic (see Section 3). Our study on this topic has driven us to the definition of Parallel Implicit Predictor Corrector Methods (PIPCMs) as a recipe for overcoming the problem of ill-conditioning. PIPCMs are a subclass of block-BVMs that, in our understanding, inherit some peculiarities of both PBMs and Parallel Diagonally–implicitly Iterated Runge–Kutta Methods (PDIRKs) [7,8]. The basic idea they are based on, is to use a well conditioned PBM with a finite region of absolute stability and, depending on whether the IVP is stiff or not in the current time step, to correct the absolute stability deficiency by means of one or a set of correctors before accepting the solution. Obviously the use of the correctors must not introduce a significant increase in the computational complexity and must require a small amount of communications. To fully grasp the motivations for the introduction of PIPCMs we have organized the paper as follows. In the next section we introduce a natural generalization of block-BVMs and establish their link to General Linear Methods (GLMs). Results on their conditioning are reported in Section 3 and used to state the inefficiency of high order A-stable PBMs. PIPCMs are introduced and studied in Section 4 where we also elucidate the way they are to be implemented. Some numerical experiments are presented in the last section. To simplify the notation, in the sequel we use matrices as linear operators Hs → Hs : if A ∈ Rs×s and Y ∈ Hs , this means that AY will stand for (A ⊗ Im )Y (Im is the identity matrix of size m). 2. Block-Boundary Value Methods A block-BVM numerically solves (stiff) initial value problems of the form (1) generating, at a generic step n of the integration procedure, a block vector (n) (n) T ∈ Hs , Yn = y(n) 1 , y2 , . . . , ys

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

237

which approximates the true solution at some given mesh times, say T Tn = t1(n) , t2(n) , . . . , ts(n) . In our use of block-BVMs, the construction of Yn will take information from the components of Yn−1 , , y(n−1) , . . . , y(n−1) (see [1, Section 11.6]). More precisely, at each time step, it is required namely y(n−1) s 1 2 the solution of the following nonlinear system AYn − hBF (Yn ) = δ, where F (Yn−1 ), n−1 + hB δ = −AY A, B, B are square matrices of dimension s with det(A) = 0, and A, T F (Y ) = f(t1 , y1 ), f(t2 , y2 ), . . . , f(ts , ys ) , is the vector containing the evaluations of the function f over the points (ti , yi ), i = 1, . . . , s. The following equivalent but more compact form for the previous system explicitly shows the relation between Yn and Yn−1 : F (Yn−1 ) Yn−1 = 0. (2) − h B, B A, A Yn F (Yn ) Block-BVMs are easily seen to belong to the class of General Linear Methods (GLMs). Indeed setting R = −A−1 B, Q = −A−1 B, P = −A−1 A, the previous system is recast as Yn = P Yn−1 + hQF (Yn ) + hRF (Yn−1 ).

(3)

This representation may be exploited to obtain convergence and stability results concerning a given formula. On the other hand, maintaining the original formulation as block-BVMs results in being more convenient for the construction and implementation of the methods themselves. By definition, in fact, a block-BVM is made up of the s variable step linear multistep formulas (LMFs) 2s

αij yj = hi

j =0

2s

βij f(tj , yj ),

i = 1, . . . , s.

(4)

j =0

The link between the two equivalent ways for describing a block-BVM, namely the s equations (4) and the system (2), is specified by the following settings: tj(n−1) , j = 1, . . . , s, y(n−1) , j = 1, . . . , s, j yj = tj = (n) (n) j = s + 1, . . . , 2s, j = s + 1, . . . , 2s, tj −s , yj −s , 1 hi , h= s i=1   α11 . . . α1s ..  =  ... , A . αs1 . . . αss s

(5) 

α1,s+1 A =  ... αs,s+1

 . . . α1,2s ..  , . ...

αs,2s

238

and

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

 h1 1  B= h

 ..

.

β11   ...

 . . . β1s ..  , .

 h1 1  B= h

 ..

.

β1,s+1   ...

 . . . β1,2s ..  . .

hs hs βs1 . . . βss βs,s+1 . . . βs,2s Summarizing we have that the component of index i of the system (2) is actually a LMF with a number of step k 2s − 1 that approximates to a given order the solution at the time ti(n) . From (5) we deduce that a single step of a block-BVM covers a time interval of length sh and the internal steps h˜ i = hi / h define the mesh t˜0 = 0, t˜i = t˜i−1 + h˜ i , i = 1, . . . , s, over the interval [0, s], so that they play the same role as the abscissae in a RK formula or in a GLM. Since we are interested in implicit formulae with non null diagonal entries in B, we assume diag(B) = Is as a normalization condition to avoid the indetermination over the coefficients αij and βij in (4). We will denote by ρi (z) and σi (z) the characteristic polynomials associated to the ith component of the system. A block-BVM may be therefore uniquely defined by the triples (ρi , σi , hi ), i = 1, . . . , s, or A], [B, B], [h0 , . . . , hs ]. Clearly during the construction of a block-BVM, one must equivalently by [A, take care that the s formulae (ρi , σi ) are connected to each other by two constrains: they must be defined (n−1) (n) = ti(n) − ti−1 = hi , i = on the same set of mesh times [Tn−1 , Tn ] which in turn must satisfy ti(n−1) − ti−1 2, . . . , s. The latter condition requires that Tn−1 and Tn share the same internal distribution of mesh times enabling the block-BVM to exploit the previous computed solution at Tn−1 to construct the new solution at Tn . The representation of the local truncation error directly comes from the theory of LMFs, and for a p order block-BVM assumes the form

n−1 F Y Y n−1 A = hp+1 G(ξ ), (6) τ (h) ≡ A

n − h B B

n Y F Y

n is the true solution yˆ (t) evaluated at the internal mesh times Tn and where Y T G(ξ ) = c˜1 yˆ (p+1) (ξ1 ), . . . , c˜s yˆ (p+1) (ξs ) , with c˜i the error constant of the ith formula. A condition for the method (2) to have order p is to require that each component linear multistep

n−1 , yields formula (ρi , σi ) has order p for i = 1, . . . , s. Subtracting (6) from (2) evaluated at Yn−1 = Y

n − F (Yn ) = τn (h),

n − Yn − hB F Y A Y

n − Yn = A−1 τn (h) + O(hp+2 ). For this reason, in order to compare two different methods and hence Y having the same order, it is preferable to consider c = A−1 [c˜1 , . . . , c˜s ]T as the vector of error constants associated to the block-BVM. has bounded powers. We observe that for a The method (2) is zero-stable, if the matrix P = −A−1 A consistent method (order p 1) the matrix P has always eigenvalue λ1 = 1. From the theory of GLMs we have that a zero-stable block-BVM satisfying (6) is convergent of order p (see Theorem 3 in the next section or [2, Theorems 426C, 433A]). The application of a block-BVM to the linear test problem y = λy, (7) y(0) = y0 , gives rise to the linear autonomous equation Yn = M(q)Yn−1 ,

q = hλ,

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

239

+ q B). Denoting by ρ(M(q)) the spectral radius of M(q), the region with M(q) = (A − qB)−1 (−A of absolute stability is D = {q: ρ(M(q)) 1}. We also recall some standard definitions concerning stability: A-stability ⇔ C− ⊂ D; L-stability ⇔ A-stability + ρ M(q) → 0 as q → ∞; A(α)-stability ⇔ q: −α < π − Arg(q) < α ⊂ D; L(α)-stability ⇔ A(α)-stability + ρ M(q) → 0 as q → ∞. A first attempt to give an answer to our final goal of finding methods, in the class (2), able to minimize the amount of communications to advance the solution, is represented by the most natural choice, namely Parallel Block Methods [9]. The possibility of exploiting the entire block-vector Yn−1 to construct the solution at the current mesh time Tn gives us the opportunity of designing block-BVMs with the matrices A and B, which handle the implicitness, having a sparse pattern. This circumstance is attractive because a small density of nonzero entries in A and B will generate a sparse interconnection picture among the processors. PBMs are designed to produce the best effect in this direction and therefore they confer to the matrices A and B a diagonal structure. In this case no exchange of information is needed during the solution of the nonlinear system (2) which now consists of s uncoupled nonlinear subsystems of dimension m. It is worthwhile to observe that, on the basis of the two above mentioned techniques to handle the nonlinear system, PBMs represent the best possible choice because any other way of designing the matrices A and B will result in the requirement of communications at each step of the iteration procedure that computes the solution of (2).

3. Conditioning and convergence in finite precision arithmetic PBMs do not introduce significant constraints to some basic properties that a method should satisfy to be competitive with other full implicit methods. As an instance A-stable (or even L-stable), high order convergent PBMs may be constructed without any apparent difficulty (see, for example, [3,9]). All the same, there is a drawback that makes the use of high order PBMs unworkable: it consists of a severe illconditioning of the formulae which, in our experience, seems to be directly induced by the highly sparse structure of the matrices A and B and dramatically increases as long as high orders of accuracy are imposed. This undesirable aspect produces a deterioration of the convergence properties of the method long before that an accuracy in the solution comparable to the unit roundoff, has been reached. This of course turns out to be inconsistent with the fact that high order formulae should operate when a small error is required in the numerical solution. The rest of this section is devoted to the study of this argument. Example 1. Consider the fourth-order methods  1 − 43 3 −4 25 4 12 20 4 15  − −5 0 5 4 3 1 , A1 =  A  100 225 400  57 − 144 0 19 19 57 20 7

− 35 3

84 5

− 35 4

0



0

0

0

77 60

0

0

19 20

0   , 0 

0

0

319 420

1 , B1 = [0, I ], B

(8)

240

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

and



0

0 0

− 83

8 3

0

0

0



0 2 , A2 =  A  0

0 0

− 32

0

3 2

0

0 0

− 16 15

0

0

16 15

0  , 0

0

0 0

− 56

0

0

0

5 6

0

1 9 1 2 6 5 20 9



0 2 , B2 =  B  0 0

− 59 −2 − 21 5 − 64 9

19 9 7 2 26 5 65 9



1 0 0

0

0 1 0 0 0 1

0  , 0

0 0 0

1

(9)

−1 defined on the constant stepsizes hi = 1, i = 1, . . . , 4. The matrices P1 = −A−1 1 A1 and P2 = −A2 A2 (1) (2) have eigenvalues λi and λi in the unit disk:

λ(1) i 1, 0.69, 0.69, 0.16,

λ(2) i = 1, 0, 0, 0,

therefore they are convergent; the error constants are, respectively, ci(1) −0.096, −0.78, −3.1, −9.2,

ci(2) −0.026, −0.34, −1.7, −5.6.

Fig. 1 reports the boundary loci of the PBMs defined above (in these and subsequent figures the stability regions are those containing the complex number with negative real part in a neighborhood of the origin). It shows that the method (8) is essentially A-stable (we could have obtained A-stability by slightly tuning the stepsizes hi but this was not essential for what this example is going to prove), while the method (9) has a small region of absolute stability. We apply the methods to the test problem (7) with λ = −1, y0 = 1 in the time interval [0, 1], using constant stepsizes. In Table 1, N is the number of points (h = 1/(4N)), EN(1) and EN(2) are the absolute (i) ∞ ). It error vectors, and the orders of convergence are estimated by the formula log2 (EN(i) ∞ /E2N is easily deduced that in both cases the error decreases at a rate typical of a fourth-order formula

Fig. 1. Boundary loci of PBMs (8) on the left and (9) on the right.

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

241

Table 1 Comparison of the convergence properties of methods (8) and (9) (1)

N

EN ∞

4 8 16 32 64 128 256 512 1024 2048

1.21e−04 2.31e−05 1.83e−06 1.12e−07 6.81e−09 4.20e−10 2.46e−11 1.15e−10 2.92e−10 4.90e−10

estimated order method (8) 2.39 3.65 4.04 4.04 4.02 4.09 −2.23 −1.35 −0.75

(2)

EN ∞ 1.64e−05 7.03e−07 3.69e−08 2.12e−09 1.27e−10 7.77e−12 4.83e−13 2.87e−14 8.44e−15 3.20e−14

estimated order method (9) 4.54 4.25 4.12 4.06 4.03 4.01 4.07 1.77 −1.92

until a saturation point has been reached: this is approximately 2e−11 for formula (8) and 8e−15 for formula (9). After that, thickening the number of points does not improve the approximation of the computed solution. A comparison of these two limit errors with the floating point relative accuracy (ε 2e−16 in all our experiments), clearly shows that something pathological occurs in method (8) able to destroy its convergence properties. Even though the construction of a convergent method is carried out by imposing the order and zerostability conditions, due to the representation of the coefficients αij and βij by floating point numbers, these conditions are no longer exactly satisfied on a computer. The two different behaviours of formulae (8) and (9), depend on how a method fails the zero-stability condition. In finite precision arithmetic the matrix P is actually represented by a perturbed matrix P = P + δP , such that δP ∞ εP ∞ .

(10)

For a convergent method possessing a matrix P with a single eigenvalue of unitary modulus (λ1 = 1) and the other eigenvalues away from the boundary of the unit disk, the loss of zero-stability is linked to the displacement of the corresponding eigenvalue λ˜ 1 in the matrix P: this form of instability increases proportionally with the distance |λ˜ 1 − λ1 |. Besides δP ∞ , the size of the displacement is reasonably expected to be proportional to µ(S), the conditioning (in infinity norm) of the matrix S that transforms P into diagonal form, that is P = S −1 DS, with D the diagonal matrix containing the eigenvalues of P (for simplicity it is assumed that P admits such a similarity transformation). In the previous example, we have µ(S1 )P1 ∞ 9e + 4 for method (8) and µ(S2 )P2 ∞ = 3 for method (9). To stress the effect of the use of finite precision arithmetic, instead of Eq. (3), one can think to actually solve the perturbed equation Zn = PZn−1 + hQF (Zn ) + hRF (Zn−1 ).

(11)

Since zero-stability involves the application of the method to (1) with f(t, y) = 0, the damaging effects of the conditioning of S can be easily shown in the simpler case F = 0 (see Theorem 3 for the general case). Eqs. (3) and (11) lead to the linear systems: Yn = P Yn−1 , Zn = PZn−1 , Y0 given, Z0 given.

242

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

The error vector En = Zn − Yn will therefore satisfy En = PEn−1 + δP Yn . For a convergent method, the solution Yn will be bounded. We denote by M = supn0 Yn ∞ and for any matrix G ∈ Rs×s we consider the norm G = S −1 GS∞ , which is induced by the vector norm Y = S −1 Y ∞ , Y ∈ Rs . Assuming (at best) E0 = 0 and observing that δP Yn S −1 ∞ δP ∞ M, we obtain n−1 i P . En δP ∞ S −1 ∞ M

(12)

i=0

The floating point relative accuracy ε, will be involved both in δP ∞ and in Pi terms. As usual we shall conduct a first-order approximation analysis that consists in neglecting the o(ε) terms in the expression of the error. Since −1 P = S (P + δP )S = D + S −1 δP S 1 + µ(S)δP ∞ , ∞ ∞ it follows from (10) that i P 1 + εµ(S)P ∞ i = 1 + O(ε). Inserting this last inequality and again (10) into (12) we conclude that εP ∞ S −1 ∞ MT . (13) En nεP ∞ S −1 ∞ M = h This result is now extended to the more general case F = 0. In particular the following lemma and theorem represent an extension of the convergence result of GLMs that accounts for the representation of the marginally stable matrix P in finite precision. The techniques used for the proof of the theorem are strictly close to those used by Butcher to derive a similar result in exact arithmetic [2, Theorem 426C], the main difference being that the perturbed matrix P may be no longer considered a stable matrix and consequently the assumption P n bounded, must be replaced by Pn Pn (1 + εµ(S)P ∞ )n . For this reason we omit part of the technical calculations. Lemma 2. Consider the sequence {un} of vectors in Rs defined by the recurrence relation un−1 + wn un = P where P is a square matrix of dimension s such that P α, α 1 and the sequence {wn } is such that wn βun + γ un−1 + δ, with β, γ , δ nonnegative constants satisfying β + γ > 0 and β < 1. Then for any integer n the following bound holds true n 1+γ n δα n n 1+γ u0 + −1 . (14) un α 1−β β +γ 1−β Proof. The same argument as in Lemma 425A of [2] may be exploited to obtain the assertion.

✷

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

Theorem 3. The error vector En satisfies the relation: εP ∞ µ(S) τn (h) τn (h) +O 1+O . En = O h h h

243

(15)

Proof. Subtracting the perturbed equation (11) from (6) yields En = PEn−1 + Wn , where

n − F (Zn ) + hR F Y

n−1 − F (Zn−1 ) + τn (h) − δP Y

n−1 . Wn = hQ F Y Fix h h0 , with h0 LQ < 1 and observe that from consistency we have Q + R > 0. Set η = εP ∞ µ(S) β = hLQ, γ = hLR, δ = maxτn (h), hh0

n (its existence comes out from the regularity of the and denote by M an upper bound of supn Y function f). The following bound for Wn is easily deduced: Wn βEn + γ En−1 + δ + ηM. Now we are in the right position to apply the previous lemma and obtain 1+γ n (δ + ηM)(1 + nη) 1 + γ n E0 + −1 , En (1 + nη) 1−β β +γ 1−β from which the assertion is easily deduced.

(16)

✷

n S −1 ∞ δP ∞ × Remark 4. Formula (16) could be slightly refined using as in (12) the bound δP Y

n µ(S)δP ∞ Y

n ; the factor µ(S) would be involved anyway for example

n ∞ instead of δP Y Y from the first term in the bound (16).

4. Parallel Implicit Predictor Corrector Methods From the result of Theorem 3 we deduce that, as seen in Example 1, the global error will be decreasing with order p until the perturbation term becomes active. We emphasize once again that the term O(εP ∞ µ(S)/ h) will therefore produce a saturation point that the accuracy can not overcome and that may become harmful if it is far away from the floating point relative accuracy. A natural way to reduce the size of this point is to use a higher precision arithmetic, that is, for example, quadruple precision instead of double precision (see [3]). This would obviously produce the undesirable effect of requiring much more costly elementary operations (for reasons of stability rather than accuracy), which in turn would significantly increase the computation times. A different option is to consider methods having the saturation point comparable to ε even with weak absolute stability properties and then, if necessary, correct the solution by means of one or a set of correctors. On the basis of Theorem 3, our

244

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

prerogative is now to look at methods with the amplification factor P ∞ µ(S) of moderate size: this feature in combination with zero-stability, will be referred to as numerical zero-stability. Definition 5. A Parallel Implicit Predictor Corrector Method of order p is defined by a finite set of i , Bi ], i = 0, . . . , r, satisfying the following properties: i , Ai ], [B block-BVMs [A (a) all the block-BVMs have order p (or at least p) and share the same distribution of internal stepsizes [h0 , . . . , hs ]; (b) A0 and B0 are diagonal matrices, that is the first block-BVM is indeed a PBM; (c) setting RAi = Ai − A0 and RBi = Bi − A0 the solution Yn at the current time step is the block vector Z6 , 0 6 r, computed by means of the first 6 block-BVMs according to the following scheme: Yn−1 F (Yn−1 ) i , B0 i , A0 = −RAi Zi−1 + hRBi F (Zi−1 ), (17) −h B A Zi F (Zi ) for i = 0, . . . , 6 with Z−1 = 0. Some considerations are needed to clarify in detail the introduction of PIPCMs. 4.1. Implementation of PIPCMs Keeping in mind the result obtained in the previous section and our incapability of finding high order A-stable, numerically zero-stable PBMs, we drop the requirement of A-stability and find a numerically zero-stable, PBM of order p with a bounded region of absolute stability. If this region is large enough to handle the degree of stiffness of the continuous problem in the current time interval Tn then we set 1 , A1 ], [B 1 , B1 ] in the way described Yn = Z0 and we have done; otherwise we apply the corrector [A by Eq. (17) for i = 1. Since A1 and/or B1 may be chosen to be full matrices, it is not difficult to make the resulting method numerically zero-stable with a wider region of absolute stability and the same argument 2 , A2 ], [B 2 , B2 ] must come into play. is repeated: the solution Z1 is accepted or the new corrector [A Obviously the method resulting from the application of all the 6 formulae must be A-stable (and if possible L-stable). This clarify points (a) and (b) of Definition 5: a PIPCM applied to an IVP produces a pth order accurate solution adjusting its absolute stability properties to locally handle the stiffness of the continuous problem during the integration procedure. We want to stress from now on that it is not correct to think that very stiff problems will therefore require that all the block-BVMs will always operate during the computation. Absolute stability deficiency may lead to a reduction of the stepsize if low accuracy is required; this situation is of course pathological and would cause a given number of correctors to become active. However it is also true that in general, low accurate solution are computed by low order methods for which the number of correctors 6 is reduced to zero or one. Consequently if we think to the final objective of constructing a code based on PIPCMs that performs an order variation strategy this disadvantage seems not to be so serious. The motivation for point (3), which agrees with the analogous definition of PDIRKs, is the following: every implicit formula in (17) consists of the same set of s uncoupled nonlinear systems of dimension m except for the known term in the right-hand side; only s parallel Jacobian evaluations and LU factorizations must therefore be computed if a modified Newton scheme is adopted to solve the nonlinear systems. No communication among the processors is needed during the solution of each block-BVM and the total number of communications to advance the solution is 6 + 1 r + 1.

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

245

4.2. Error constants We denote by cˆ 6 , and c6 , 6 = 0, . . . , r, the vectors of the error constants associated to the 6th block6 , B6 ] and to the PIPCM, respectively; we want to find c6 as a function of the 6 , A6 ], [B BVM [A

Tn−1 , Y

Tn ]T into (17) and subtracting from (17) itself, cˆ i ’s, i = 0, . . . , 6. Substituting the true solution [Y yields

n − Zi − hB0 F Y

n − F (Zi ) A0 Y

n − Zi−1 + hRBi F Y

n − F (Zi−1 ) , = τi (h) − RAi Y from which we get

p+1 −1

n − Zi = A−1 . Y 0 τi (h) − A0 RAi Yn − Zi−1 + o h

It follows that ci = cˆ i − A−1 0 RAi ci−1 , c0 = cˆ 0 and finally we obtain c6 =

6

i ˆ 6−i . (−1)i A−1 0 Ai − I c

(18)

i=0

A], [B i , Bi ] = [B, B], for i = 1, . . . , r, that is we adopt a unique i , Ai ] = [A, In the case we chose [A corrector formula with error constants vector cˆ , we conclude that c6 → cˆ for 6 → ∞ if the matrix ˆ with small components, (A−1 0 Ai − I ) has its eigenvalues inside the unit circle. In this situation, choosing c we expect an improvement of the ci ’s as i goes from zero to 6 even for small values of 6 (see Example 6). 4.3. Stability Applying the PIPCM (17) to the test equation (7), we have i Yn−1 − (A0 − qB0 )−1 (RAi − qRBi )Zi−1 . i − q B Zi = −(A0 − qB0 )−1 A It follows that Yn = Z6 = P6 (q)Yn−1 where the iteration matrix P6 (q) assumes the following expression: 6 6 − q B P6 (q) = −(A0 − qB0 )−1 A +

6−1

(−1)

6−i+1

6

j −1 , j −1 − q B (A0 − qB0 )−1 (RAj − qRBj )(A0 − qB0 )−1 A

j =i+1

i=0

where the product is to be computed starting from j = 6 and ending with j = i + 1. Zero-stability of each component method of the PIPCM requires that the matrix P6 (0) =

−A−1 0 A6

+

6−1 i=0

(−1)

6−i+1

6 j =i+1

−1 A−1 0 (Aj − A0 )A0 Aj −1

246

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

is stable for 6 = 0, . . . , r. For example choosing Ai = A0 for i = 1, . . . , 6, we derive P6 (0) = −A−1 0 A6 = −1 −A6 A6 and then zero-stability of the 6th component of the PIPCM will be a consequence of zerostability of the 6th block-BVM. On the other side P6 (∞) =

6 −B0−1 B

+

6−1

(−1)

6−i+1

6

j −1 . B0−1 (Bj − B0 )B0−1 B

j =i+1

i=0

6 = −B6−1 B 6 and the PIPCM will preserve the Again Bi = B0 , i = 1, . . . , 6, implies P6 (∞) = −B0−1 B 6 , B6 ] as q → ∞. For example L(α)6 , A6 ], [B same asymptotic properties as the corresponding pair [A stability is obtained choosing B6 = 0 or even strictly triangular. Example 6. Consider i = 1, 2 the correctors  0  0 i , Ai =  A  0 0

0 , B0 ] and for 0 , A0 ], [B again the fourth-order method (9) now denoted by [A 0 0

− 14

− 56

3 2

− 12

0 0

1 12 1 − 12 1 4

− 23

0

1 2

− 32

2 3 5 6

− 43

3

−4

0 0 0 0

1 12 1 − 12 1 4 25 12

   , 

i , Bi = [0, I ]. B

(19)

It may be verified that the single corrector is L-stable with error constants 0.018, −0.022, 0.047, −0.24. The predictor (9) and the correctors (19) are put together to form a PIPCM. Fig. 2 reports part of the boundary locus when the PIPCM is applied with 6 = 1 and 6 = 2, while the error constants are, respectively ci(1) 0.032, 0.072, 0.53, 1.2,

ci(2) 0.082, −0.10, −0.034, 0.28.

In both cases the PIPCM is numerically zero-stable (P ∞ µ(S) = 3), L(α)-stable for 6 = 1 and L-stable for 6 = 2.

Fig. 2. Boundary locus of the PIPCM of Example 6 with 6 = 1 (left) and 6 = 2 (right).

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

247

5. Numerical tests The topic of finding PIPCMs suitable for an implementation into a code, deserves further research and will not be an issue in this paper. In [6] we shall present some PIPCMs and experimental results which will give more evidence of their effectiveness. Here we are rather interested to stress the way these methods are to be implemented and the following example is in this direction. We consider a uniform internal mesh hi = 1, i = 1, . . . , 7, the predictor   720 0 0 0 − 720 0 0 0 251 251  0 0 0 − 450 0 450 0 0   269 269 0 A0 =  A  , 400  0 0 0 − 400  0 0 0 329 329 0 

19 − 251

 − 124 0 B0 =  B  269  − 460 329 − 2560 817

and a couple of correctors  1 420

 0 1 A1 =  A   0

1 − 105

0 0 2 A2 =  A  0 0

− 1575 1634

0 0

0

− 264 251

0

0

1575 1634

646 251 1360 269 410 47 595 43

1

0 0

0

0

1 0

0

0 1

0   0

0

0 0

1

106 251 615 269 2106 329 11060 817

− 17472 817

0

0

− 13

7 − 12

1 − 60

0

− 23

1 45

1 15 1 − 12

0

5 9

0

0

7 12

− 73

8 195 13 − 810 1 64 119 − 2220

− 1220 269 − 3645 329

51 − 260 17 270 7 − 160 9 74

3 26 1 108 19 − 192 33 74

− 15 13 − 55 81 11 16 − 251 111

6 5 1 12 − 53 21 5

− 13 3 5 19 20 − 14 3

21 13 5 54 − 113 64 621 148



1 21 1 − 15 2 9 187 84

− 129 260 161 270 95 96 − 1731 370

   , 

29 390 107 − 1620 69 320 165 74

   , 

2 = 0, B1 = B2 = I . Each block-BVM has order 5 and the resulting PIPCM consists of three 1 = B with B methods (r = 2): the pure predictor P (6 = 0) has a finite small region of A-stability, the PC1 method (6 = 1) is L(α)-stable and the PC1 C2 method (6 = 2) is L-stable (see Fig. 3). Acting on the stepsizes hi and a number of coefficients in each formula it is possible to obtain L-stability using a single corrector (r = 1). However we preferred to consider the above PIPCM to show that even in this unfavourable case we can get nice results. Table 2 shows the all to all communications (comm.) needed among the processors to advance the solution in time (that is to construct Yn from Yn−1 ), the conditioning of the three methods defining the PIPCM and their error constants. As mentioned in the previous section, to minimize the communications one can plan to concurrently use the methods P, PC1 and PC1 C2 . In short, one first applies the predictor; if the estimated error is smaller than the required precision, the solution is accepted. Otherwise one corrects once the solution applying PC1 and again estimates the error. One can then decide whether to apply PC1 C2 . To get an

248

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

Fig. 3. Boundary loci of the methods P, P+PC1 +PC1 (left) and PC1 , PC1 C2 (right). Table 2 Some features of the PIPCM defined in Section 5 method

all to all comm.

P ∞ µ(S)

P PC1 PC1 C2

1 2 3

7.46 9.37 9.62

error constants −1.88e−2, 2.46e−1, 8.90e−1,

−2.87e−1, 2.59e−1, 8.42e−1,

−1.65, 5.91e−1, 7.09e−1,

−6.19 1.50 9.72e−1

idea about the reliability of this approach, we perform the following test. We consider the linear problem y (t) = Ly(t) in the time interval [0, 100] and the matrix L = T DT −1 with 3 2 −1 0 T= , D= . 1 1 0 −20 In particular, we find the minimum number of uniformly distributed mesh points, such that the error, evaluated as ˆy(ti ) − yi ∞ max ti 1 + ˆy(ti )∞ is less than a fixed tolerance tol. Then we count the number of times each method has contributed to advance the solution. Table 3 shows the obtained results. Column 2 (labelled by PIPCM) reports the number of points needed to solve the problem by the methods P, PC1 and PC1 C2 used together as described above: in columns 3–5 the percentage each method has been involved is shown and the average number of communications per step can be read in column 6. As comparison, columns 7–9 report the number of points when the methods are used separately. To get a more concrete idea on the reliability of our approach we also performed the test using the GBDF formula of order 5, denoted by GBDF5 (see the last column of the table). It is an L-stable block-BVM formula that only uses the last component of Yn−1 for the construction of Yn and has B = I and a full A [1]. We chose as initial condition y0 = [3, 1]T , that is an eigenvector of the eigenvalue λ1 = −1. Consequently the mode associated to the eigenvalue λ2 = −20 (the stiff component of the solution) will be absent but will produce its effects on the discrete problem encouraging the use of the correctors. In fact, because of their favourable stability properties, the methods PC1 and PC1 C2 will use rather large stepsizes as they have only to reproduce, within the given

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

249

Table 3 Results for the test problem y = Ly. Columns 2 and 7–10 report the number of steps required by each used method to provide an error in the solution within a specified value of the tolerance (first column) tol

PIPCM

%P

%C1

%C2

comm. step

P

PC1

PC1 C2

GBDF5

1e−4 1e−5 1e−6 1e−7 1e−8 1e−9 1e−10

86 133 208 352 547 874 1308

50% 41% 38% 48% 67% 88% 97%

49% 57% 60% 50% 31% 11% 3%

1% 2% 2% 2% 2% 1% 0%

1.51 1.60 1.63 1.53 1.35 1.12 1.03

1088 1089 1094 1099 1101 1528 2438

88 151 252 413 667 1069 1710

86 133 208 329 520 822 1299

65 113 192 315 507 813 1297

tolerance, the non stiff component of the solution. On the contrary, the predictor P has weak absolute stability properties that will cause the use of very small stepsizes even though low accuracy is required. This is confirmed by the values of columns 7–9 in the table: note in fact that the number of points used by the predictor stays almost the same until an accuracy higher than 1e−8 is required (after that, the stepsize is reduced for accuracy reasons). The behaviour of the PIPCM seems to be very promising: it requires a number of points very close to the A-stable formula PC1 C2 but halving the number of communications per step (this obviously will also benefit the computational costs). In particular the table reveals that the higher the accuracy required, the higher the contribution of the predictor in constructing the numerical solution. For tol = 1e−10 we notice that a very small contribution (3%) of PC1 allows the predictor to determine the solution with about half the points that would be required if it had been implemented alone. For high tolerances the predictor works outside of its absolute stability region, and therefore its solution needs more frequently corrections; however in most cases it continues to play a dominant role. We also see that the overall performance of the PIPCM is similar to that of GBDF5 and thus a parallelization would seem to be quite promising. It is worth noting that the application of a combination of methods P, PC1 and PC1 C2 , gives rise to a method with a stability region which is intermediate among those of the single methods. For example, for tol = 1e−6 the PIPCM eventually uses the sequence of methods P+PC1 +PC1 . This results into a method with iteration matrix P (q) = P0 (q)P12 (q) whose stability region is reported in Fig. 3. Acknowledgements The authors are very grateful to D. Trigiante and P.J. van der Houwen for their comments and suggestions that improved the paper.

References [1] L. Brugnano, D. Trigiante, Solving ODEs by Linear Multistep Initial and Boundary Value Methods, Gordon and Breach, Amsterdam, 1998. [2] J.C. Butcher, The Numerical Analysis of Ordinary Differential Equations, Runge–Kutta and General Linear Methods, Wiley, Chichester, 1987.

250

F. Iavernaro, F. Mazzia / Applied Numerical Mathematics 42 (2002) 235–250

[3] P. Chartier, L-stable parallel one-block methods for ordinary differential equations, SIAM J. Numer. Anal. 31 (1994) 552–571. [4] F. Iavernaro, F. Mazzia, Block-boundary value methods for the solution of ordinary differential equations, SIAM J. Sci. Comput. 21 (1) (1999) 323–339. [5] F. Iavernaro, F. Mazzia, On the extension of the code GAM for parallel computing, in: P. Amestoy, P. Berger, M. Dayde, I. Duff, V. Fraysse, L. Giraud, D. Ruiz (Eds.), Euro-Par ’99 Parallel Processing, Proceedings of the 5th International Euro-Par Conference, Lecture Notes in Comput. Sci., Vol. 1685, Springer, Berlin, 1999, pp. 1136–1143. [6] F. Iavernaro, F. Mazzia, Implementation of parallel implicit predictor corrector methods, in preparation. [7] P.J. van der Houwen, B.P. Sommeijer, Iterated Runge–Kutta methods for parallel computers, SIAM J. Sci. Statist. Comput. 12 (5) (1991) 1000–1028. [8] P.J. van der Houwen, B.P. Sommeijer, W. Couzy, Embedded diagonally implicit Runge–Kutta algorithms on parallel computers, Math. Comp. 58 (197) (1992) 135–159. [9] B.P. Sommeijer, W. Couzy, P.J. van der Houwen, A-stable parallel block-methods for ordinary and integro-differential equations, Appl. Numer. Math. 9 (1992) 267–281. [10] W.M. Lioen, J.J.B. de Swart, W.A. van der Veen, Test Set for IVP Solvers, CWI, Department of Mathematics, Amsterdam, Report NM-R9615, 1996.

Lihat lebih banyak...

Parallel implicit predictor corrector methods

Descripción

Comentarios