Massively Parallel Conformal FDTD on a BlueGene Supercomputer

Share Embed


Descripción

Massively Parallel Conformal FDTD on a BlueGene Supercomputer R. Mittra, Wenhua Yu and M. R. Hashemi D. N. de Araujo, M. Cases, N. Pham, E. Matoglu Pennsylvania State University IBM System and Technology Group, Electromagnetic Communication Laboratory 11400 Burnet Rd., Austin, TX 78758 319 EE East, University Park, PA 16802 Email: dearaujo, cases, npham, [email protected] Email: [email protected] P. Patel, B. Herrman IBM System and Technology Group, 3039 Cornwallis Rd., Research Triangle Park, NC Email: pravin, [email protected]

Abstract This paper presents modeling and simulation utilizing a parallel conformal Finite Difference Time Domain (PFDTD) [1] code developed at the Electromagnetic Communication (EMC) Laboratory of Pennsylvania State University on a massively parallel supercomputer. Parallelization, scalability, and application to the modeling of high-end server electrical interconnects is examined.

Introduction Ever increasing electrical signaling speeds as well as interconnect complexity, places an increased burden on the designers to provide a solution that is cost effective and manufacturable. The International Technology Roadmap for Semiconductors (ITRS), projects the need for numerical methods for 3D modeling to double every two years from 2003 through 2009 [2]. While the focus on processor operation frequency has diminished due to the thermal and power delivery challenges, designers are increasing performance through thread level parallelism (TLP) and chip level parallelism (CLP). Companies such as IBM, Intel, AMD, Sun, Broadcom, Freescale, Via–among others–have enabled multithreading and/or multiple CPU cores per processor to increase the system performance. Novel architectures and computing models such as the massively parallel BlueGene, can provide very high computational capabilities at a fraction of the power and cooling required by off-the-shelf supercomputers [3]. The complexity of structures, higher frequencies, and higher density interconnect prompt the need for efficiency, parallelism, and accurate results.

FDTD and Parallelism The original Yee algorithm [4] for the Finite Difference Time Domain (FDTD) method for solving Maxwell’s equations have been modified by Yu and Mittra [1] to handle arbitrary objects without introducing staircasing errors present in the conventional FDTD. This enables the FDTD to handle curved objects with accuracy that is comparable to that achieved by using the Finite Element frequency domain codes, such as the HFSS. The conformal version of the FDTD code (CFDTD) was recently parallelized by Yu et al. [5] to enhance its capability to solve large problems in a numerically efficient manner. (Note that an FEM code, or its time domain counterpart FETD, are not easily parallelized.) The Parallelized Conformal Finite Difference Time Domain (PFDTD) software package utilizes the MPI library to carry out the field exchange between neighboring processors in a highly efficient and robust way. An automatic conformal mesh generating algorithm that also utilizes parallel processing is included in the code, as is the incorporation of lumped elements; matched terminations and sub-griding; and, the collection of meshing and simulation results. The FDTD update equations for (Ex, Hy) have the following form:

1

z y

Exn ( i,

j, k ) =

Exn −1

c ∆t H zn −1 / 2 (i, j, k) − H nz −1 / 2 (i, j − 1, k) + y ε∆ c ∆t H yn −1 / 2 (i, j, k ) − H ny −1 / 2 (i , j, k − 1) − ε ∆z

(

)

(

)

Ey

x

Hz Ex

Ex Ez

Ez Ey

Hx Hy

H nx +1 / 2 ( i, j,k ) = H nx − 1 / 2 −

Ey

c ∆t n E z (i, j + 1, k) − E zn (i, j ,k ) y µ∆

(

)

c ∆t E yn (i , j , k + 1) − Eyn (i, j , k) + z µ∆

(

Hy

Hx

Ez

Ex

Ex Hz

) Ey

Figure 1 – Yee’s Algorithm Similar equations can be written for the other components. While computing the E-field and H-field, data needs to be exchanged between adjacent cells only as seen in Figure 2. The information exchange locality of z z FDTD makes the algorithm highly suitable for parallel methods. Efficiency of parallelism is H (i, ny −1, k ) H (i,1, k ) H (i, ny, k ) determined by many factors. Amdahl’s law [6] H (i,0, k ) dictates the maximum speed-up that can be H (i, ny−1, k ) H (i, ny, k ) H (i,0, k ) H (i,1, k ) achieved given the serial/parallel nature of the runtime. The serial/parallel runtime is dependent on several factors such as; overhead y x for parallelization, load balancing, and communication overhead/efficiency. BlueGene Overlapping cell Overlapping cell has five interconnect networks for I/O, debug, x y Interface Interface and inter-processor communication: 3D torus, a Right processor Left processor Fat Tree, Gigabit Ethernet, Fast Ethernet, and JTAG [7]. The 3D torus is particularly efficient Figure 2 - Data Exchange Required Only for Adjacent for the parallel FDTD method. Cells For a 3D structure with an orthogonal grid, the information exchange is contained in six directions for the adjacent cells as shown in Figure 3 (a) and (b). left z

rig h t z

left z

right z

left x

Above

Right

left x

z+

Back

Data Exchange

Left

left x

left x

x-

BlueGene Torus

Front

z-

y+

Below

(a)

(b)

Figure 3 – (a) FDTD 3D data exchange requirements, (b) BlueGene Torus interconnection networks

2

yx+

Modeling and Simulation One of the principal advantages of the Parallel Conformal Finite Difference Time Domain simulator is its relatively easy steps in creating and simulating geometries. In the process of creating and preparing geometries, there are some simple but yet very important rules regarding the relationship between the cell size and the geometry size that the user needs to be aware of and follow. The two most critical and important rules are that (a) cell size should be smaller than one fifteenth of the shortest wavelength and (b) the finest structure in the domain should at least include three cells in each direction. Neglecting any of one of these rules may result in an inaccurate result. Several factors contribute to the complexity of a Finite Difference Time Domain (FDTD) simulation. Minimum feature size determines the cell size, which in turn determines the duration of each time step for the simulation. The size of the structure is used to determine the time duration of the simulation, which combined with the step-size duration provides the number of time steps. There are no exact ways to calculate the required number of time steps but as a good initial estimate, the following equation can be used,

(2 L ) + ( 1 ) v f # of time steps = ∆t In this equation, L is the length of the geometry in which the wave is propagating, v is the propagation velocity, f is the frequency of the excitation pulse, and ∆t is the estimated time step what is provided by the code. Depending on the geometry for example, high resonator geometries, more time steps might be required. In the FDTD algorithm, desired parameters such as voltages, currents and related information are derived from the E- and H-fields. The values of these fields at the last time step of the simulation are saved and can be used in case of continuation in simulation to obtain more results and save time. The Level of Effort (LOE) is a simulation time metric used to determine the runtime of a simulation and can be calculated by:

LOE = (A million cells) x (B thousand time steps) x k The efficiency of the PFDTD code has been demonstrated on several platforms including the BlueGene/L (shown below), where better than 90% efficiency has been achieved. For a given structure, the run-time increases linearly with the number of time steps as shown in Figures 4 and 5 respectively. PFDTD Simulation Run Time vs. Number of Time Steps

PFDTD Simulation Run Time vs. Cell Count

90

600

80

Simulation Time (s)

Simulation Time (s)

500 70 60 50 y = 0.016x + 0.7074 2 R =1

40 30 20

400 300 y = 0.0611x + 25.778 2 R = 0.9991

200 100

10 0

0 0

1000

2000

3000

4000

5000

Number of Time Steps

Figure 5 -- Simulation Time vs. Number of Time

6000

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Number of Time Steps

Figure 4 -- Simulation Time vs. Cell Count

Scalability data from 1 to 16 processors, and 8 to 500 processors are obtained and shown in figures 6(a) and 6(b) respectively.

3

BlueGene PFDTD Scaling

BlueGene PFDTD Scaling

Simulation time vs. Number of Processors

Case: test_y_*

10000

Run Time (s)

Run time (s)

10000

1000 -0.9462

y = 1494.2x 2 R = 0.9996 100

10

1000

y = 19496x-0.863 R2 = 0.9958

100

10

1

10

100

1

10

100

Nodes

Nodes

(a)

(b)

1000

Figure 6 – (a) Run Time vs. Number of CPUs from 1 to 16, (b) Run Time vs. Number of Processors from 8 to 500 LOE can be estimated accurately a priori with the linear nature of the runtime with respect to the number of cells and number of time steps in the simulation. The simulation time using other methods such as FEM is difficult to predict from structure to structure at different frequencies due to the unpredictable nature of the adaptive mesh convergence. Scalability of this method is bound by number of cells compared to the processors where the communication time to adjacent cells increase to a maximum where each processor is assigned a single cell.

Validation and Results For the purpose of validation and comparison of the Parallel Conformal FDTD code, a plane resonant structure was chosen to run on the PFDTD and HFSS which is commercial frequency domain field solver software. The structure is shown in figure 7. The geometry was made of two parallel plates and dielectric slab between them. As it is apparent, this geometry act like a cavity resonator and would resonate as a function of frequency. The results of simulations are shown in Figure 8. Figure 8(a) shows the s-parameters results using PFDTD after extrapolation, and figure 8(b) is s-parameters obtained by using HFSS. As it apparent is both results, the resonant frequencies for both simulation is the same in fact there is a high order agreement in both results.

3000 MIL

1 MIL 15 MIL 7000 MIL Port 1: 50, 50 Port 2: 1500, 3500

Figure 7 -- Plane Resonant Test Case

(a) (b) Figure 8 – S-Parameters (a) Using PFDTD, (b) Using HFSS

4

Due to windowing of time domain response, the peaks of the FDTD are not as sharp, but the resonant frequencies are captured well. The close form solutions of the resonances for this problem are circled in RED in Figure 8(a).

Summary According to the high correlation achieved by comparison between PFDTD results and HFSS results it has been proven that PFDTD is very reliable tool which provides the comparable results as other methods, in addition to that it well give a good understanding of time domain activities as well as frequency domain. In addition to the versatility, one of the principal advantages of PFDTD is its parallel option and mechanism which gives the ability of tackling extremely large problems in a reasonable and time on parallel platforms. The power of PFDTD is further enhanced by using the BlueGene system and this power is evident from the excellent scalability achievement presented herein and in Attachment A. Also, illustrative results of additional investigation pertaining to the application of the FDTD to packaging problems are included in Attachment B.

Acknowledgements The authors would like to express their appreciation to Tao Su and Yongjun Liu from the Electromagnetic Communication Laboratory, Pennsylvania State University for their assistance with the PFDTD code. Thanks are also due to Dr. Robert Walkup of IBM T.J. Watson Research Center and Dr. Y. Joanna Wong, of IBM Advanced Technical Support, Deep Computing Solutions, for their help with porting the code on BlueGene. Finally, we acknowledge the IBM Faculty Grant, sponsored by Moises Cases of IBM System and Technology Group in Austin. References 1. 2. 3. 4. 5. 6.

7.

Wenhua Yu, Raj Mittra, CFDTD: Conformal Finite Difference Time Domain Maxwell's Equations Solver, Software and User's Guide, Artech House, Incorporated, December 2003, ISBN 1580537316 International Technology Roadmap for Semiconductors - Semiconductor Industry Association 2003 – Modeling and Simulation, pp. 21., Available from: http://public.itrs.net. A.Gara et. al., “Overview of the Blue Gene/L System Architecture”, IBM Journal of Research and Development 49, Number 2/3 March/May 2005, pp 195-212. K. S. Yee, “Numerical Solution of Initial Boundary Value Problems Involving Maxwell’s Equations in Isotropic Media,” IEEE Trans. on Antennas and Propagation, Vol.14, No.5, May 1966, pp.302-307. Wenhua Yu, Raj Mittra, Tao Su, and Yongjun Liu, "A Robust Parallelized Conformal Finite Difference Time Domain Field Solver Package Using the MPI Library," IEEE Antennas and Propagation Magazine, Vol.47, No.3, 2005 (to appear). Amdahl, G.M.. Validity of single-processor approach to achieving large-scale computing capability, Proceedings of AFIPS Conference, Reston, VA. 1967. pp. 483-485 N.R. Adiga et. al., “BlueGene /L torus interconnection network,” , IBM Journal of Research and Development 49, Number 2/3 March/May 2005, pp 265-276.

5

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.