SEMPA: software engineering for parallel scientific computing

Descripción

.

Software Engineering Software Engineering

Sempa Software Engineering for Parallel Scientific Computing Peter Luksch, Ursula Maier, Sabine Rathmayer, and Matthias Weidmann Technical University at Munich

Friedemann Unger AEA Technology GmbH

Sempa brings together researchers from computer science, mechanical engineering, and numerical analysis to develop softwareengineering methods for parallelizing existing scientific-computing software packages. To define and evaluate these methods, the researchers implemented a parallel version of TfC, a computational fluid dynamics simulation program.

64

T

he wide availability of interfaces such as PVM and MPI enables software engineers to write parallel software that is portable across a wide range of hardware platforms. Parallel processing, therefore, has become an attractive option for commercial software vendors. However, experience has shown that developing software for parallel platforms still is much less productive than writing sequential programs, because adequate methods and tools are not available. Recently, many projects—Europort,1 for example—have developed parallel versions of large-scale software packages, most of them for scientific computing. However, these packages focused entirely on optimizing the performance of an individual application; the researchers did not attempt to derive methods from their experience that can be generalized to at least a certain class of problems. The Sempa (software engineering methods for parallel applications) project focuses on softwareengineering methods for the design of portable parallel software in scientific computing. As a real-world test case for defining and evaluating these methods, Sempa chose the parallel implementation of TASCflow for CAD (TfC), a state-of-the-art industrial computational fluid dynamics (CFD) simulation package. TfC operates on unstructured hybrid grids and uses an algebraic multigrid method to solve the linear equations obtained from finite-volume discretization. In addition, Sempa researchers are demonstrating the potential of new languages and programming paradigms such as data parallelism and object orientation by reimplementing the algebraic multigrid solver 1063-6552/97/$10.00 © 1997 IEEE

IEEE Concurrency

.

Project partners Sempa’s objectives have engendered an interdisciplinary approach that brings together partners with experience and competence in mechanical engineering, computer science, and numerical analysis:

(AMG),2 a key module in TfC, in Fortran 90, High Performance Fortran, and C++. Sempa’s target platforms range from highperformance massively parallel processors (MPPs) to networks of workstations. NOWs are of particular interest to small and medium companies and research laboratories. Efficient use of workstations for production runs of parallel applications requires automated resource management. Therefore, Sempa researchers (see the sidebar, “Project partners”) have also implemented a resource manager for batch execution of PVM programs. By using idle resources, parallel jobs execute in batch mode concurrently with interactive sessions. Figure 1 summarizes Sempa’s objectives and their dependencies and mutual relationships.

• The Lehrstuhl für Rechnertechnik und Rechnerorganisation (chair for computer technology and computer architecture) at the Technical University at Munich (LRR-TUM) contributes know-how on parallel programming and tools for the design and analysis of parallel programs, as well as experience from several projects that have parallelized software from different domains of scientific computing. • AEA Technology GmbH (formerly Advanced Scientific Computing (ASC)) develops and markets TfC, a computational fluid dynamics package applicable to a wide range of fluid problems. TfC solves the Navier-Stokes equations in three dimensions. The Sempa project parallelized TfC. • The Institut für Computeranwendungen (ICA III) (Inst. for Computer Applications) at the University of Stuttgart researches adaptive multigrid methods. ICA has an advisory function in Sempa. • Genias Software GmbH contributes Codine, a batch-queuing system for NOWs, which is the basis of the Sempa Resource Manager.

SEMPA’S GLOBAL PROJECT SCHEDULE

y tud es cas rld alwo Re

Inc r po eased rta bili prod ty, and uctivi so ty, on

r sso oce bility ipr ult orta lm ,p tua ure Vir itect h arc e : ) ase fac ger t c nter ana Tes tion i ce m ur ca cifi eso spe tion/r a plic

July–September 1997

Software-engineering methods for parallel scientific computing • Program analysis in interdisciplinary projects • Standards for design documentation • Portable message-passing programming (independence from hardware and message-passing library) • New languages: Fortran90, HPF • Object-oriented scientific computing

(ap

Figure 2 illustrates our strategy for achieving Sempa’s objectives, which consists of several interdependent tracks of activities. For each track, we have defined milestones, each of which results from a software-engineering module. An SEM is a piece of work that a small group of developers can accomplish in a couple of weeks or a few months. Initially, the parallelization of TfC and the design of the resource manager are independent tracks of activities. When prototypes of both systems are available, we’ll use the parallel CFD code to study the resource manager’s behavior. We started the investigation of new paradigms (data parallelism and object orientation) and languages (Fortran 90, HPF, and C++) as a separate track of activities, rather than redesigning the sequential program in a new language before starting parallelization. We considered the latter option too risky, given the project’s time and manpower restrictions, because the TfC software is very complex (more than 100,000 lines of Fortran 77 code). In addition, we initially didn’t know to what extent these paradigms, and their implemen-

Parallelization of TASCflow

Real-world test case

• Increased performance with respect to Speed (CPU) Modeling capability (memory) • Hardware platforms: Optimized Network of workstations (NOW) execution Massively parallel processors (MPP)

Load balancing and resource management • Dynamic load balancing at the application and the system level • Batch execution of parallel production runs • Dynamic resource management in NOWs to optimize utilization of workstations for interactive and batch operation

Figure 1. Sempa project objectives. 65

.

Parallelization of TfC TfC (AEA) Software-engineering guidelines Software-engineering methods Object orientation

Integration of Interpartition Parallel partitioning communication discretization Standard for design document

Define objects AMG in Fortran 90

Data parallelism

TfC design document

AMG in C++

Parallel AMG

SWE SWE guidelines guidelines (V. 2) (V. 1)

Discretization in C++

OO TfC

ParTfC with SRM ParTfC batch-execution option ParTfC profiling

SWE guidelines (V. i )

OO ParTfC

AMG in HPF

Codine batch queueing for NOWs (Genias) CoCheck transparent Resource checkpointing and management process migration for parallel programs (PVM) (LRR-TUM )

Done In progress Not yet started Future option

Sempa Resource Manager (SRM)

SRM profiling and evaluation

AMG AEA LRR

Algebraic multigrid solver AEA Technology GmbH Lehrstuhl für Rechnertechnik und Rechnerorganisation (chair for computer technology and computer architecture) PVM Parallel Virtual Machine SWE Software engineering TfC TASCflow for CAD ParTfC Parallel TfC TUM Technical University of Munich

Figure 2. Sempa’s global project schedule.

tations in these languages, could successfully be applied to our problem. We therefore decided to start with the AMG as a case study for demonstrating the capabilities of the languages. The definition of software-engineering methods and standards is closely related to the parallelization track, although no direct dependency of milestones exists between these tracks.

• Test, analyze the performance of, and maintain the system.

Parallelization of TfC

GLOBAL SOFTWARE REQUIREMENTS

The design cycle for our parallelization of TfC follows these six stages: • • • • • 66

Specify the global software requirements. Analyze and document the sequential program. Identify the parallelism. Set up the global strategic plan. Implement the SEMs.

For a detailed explanation of this cycle, see the sidebar, “The software-design process in parallel scientific computing.” We’ll now discuss how we’ve applied the first five stages in Sempa.

Efficiency and scalability are obvious requirements in any parallelization project. Another primary requirement of the parallel version of TfC is portability. ParTfC must execute efficiently on NOWs and MPPs. It must also be compatible with TfC’s pre- and postprocessing tools. Appropriate default settings should relieve the user from having to care about issues related to parallel execution, such as starting PVM daemons or specifying hosts of the virtual machine. However, an IEEE Concurrency

.

The software-design process in parallel scientific computing The vast majority of software projects in scientific computing are parallelizations of existing software. 1 Ideally, their design cycle follows these six stages:

for the parallel program’s target architectures, and document the decision process. The solution that has been selected for implementation is formulated as pseudocode.

GLOBAL SOFTWARE REQUIREMENTS SPECIFICATION

GLOBAL-STRATEGIC-PLAN DEVELOPMENT

A brainstorming group of software developers, hardware experts, managers, and users prepare the global software requirements specification, which defines the project objectives. The GSRS defines the functionality of the program to be implemented, states the performance requirements, defines the target platforms for the software, and sets up a suite of test cases for validation and performance analysis. The GSRS also focuses on compatibility with existing preand post-processing tools.

Next, a global strategic plan is set up to implement the adopted approach. The GSP defines the requirements of the software as a whole and decomposes the implementation process into several subtasks of manageable complexity, called software-engineering modules. An SEM is characterized by a set of functional requirements and interface definitions. The GSP documents dependencies between SEMs and sets up a coarse-grained time schedule.

SEM IMPLEMENTATION SEQUENTIAL-PROGRAM ANALYSIS AND DOCUMENTATION In interdisciplinary projects, computer scientists must acquire the know-how from the problem domain that is necessary to understand the algorithms and their (sequential) implementation. A design document that summarizes the result of this analysis serves as a reference throughout the project. The document describes algorithms at the level of abstraction that is common in the problem domain. For example, it explains the algorithms in TfC in terms of nodes, finite volumes and elements, and so on, rather than in terms of arrays and subroutines. These problem-domain objects and the operations performed on them are related to the program’s data structures and subroutines. Such a high-level algorithmic description, which is expressed in pseudocode, provides a basic understanding of the model and a common “language” for interdisciplinary discussion. Also, the pseudocode representation helps distinguish algorithmic properties from implementation decisions in the sequential code.

IDENTIFICATION OF PARALLELISM On the basis of the knowledge in the design document, computer scientists and application experts identify dependencies at the algorithmic level and possible approaches to parallelization. They determine the most efficient approach

advanced user should be able to customize the parallel environment. ParTfC has no specific functional requirements—except that it must compute the same results as TfC, of course.

ANALYSIS AND DOCUMENTATION The Sempa project organized the analysis and documentation of TfC as a series of joint seminars of project members from AEA Technology GmbH and the Lehrstuhl für Rechnertechnik und Rechnerorganisation (chair for computer technology and computer architecture) at the Technical University at Munich. Design documentation was written at LRR-TUM and July–September 1997

The implementation of each SEM follows a modification of the waterfall model.2 After each major stage of the waterfall model, project members from outside the developers group review the products of that stage. Depending on the review’s results, the design process either proceeds to the next stage or reiterates previous stages.

SYSTEM TESTING, PERFORMANCE ANALYSIS, AND MAINTENANCE

On completion of all SEMs, the computer scientists and application experts test the parallel program as a whole and analyze its performance. If it passes all the tests, it is released to the user community. User feedback will indicate bugs to be fixed, additional functionality to be added, and so on. A later release of the software will integrate further optimizations such as integration of dynamic load balancing.

References 1. C. Cook, C.M. Pancake, and R. Walpole, “Are Expectations for Parallel Computing Too High? A Survey for Potential Parallel Users,” Proc. Scalable High Performance Conf., Washington D.C., 1994; http://epcc.ed.ac.uk/epcc-tec/documents/ SC94/s94_survey_intro.html. 2. C. Ghezzi, M. Jazayeri, and D. Mandrioli, Fundamentals of Software Engineering, Prentice-Hall, Upper Saddle River, N.J., 1991.

reviewed by AEA. Although TfC is well-structured and CFD developers’ documentation is available for most of the software, the analysis process was quite timeconsuming because of TfC’s complexity. Moreover, the CFD developer’s documentation does not address many aspects that are relevant for parallelization. The result of our analysis has been a rather heterogeneous collection of documents on different aspects of design and implementation. We integrated them into a comprehensive design document for which we have defined a standardized structure.3 These guidelines for design documentation apply to a wide range of application programs in scientific computing and beyond. 67

.

Wedge

Hexahedral

Tetrahedral

Figure 3. Element types (topologies) in TfC.

IDENTIFICATION OF PARALLELISM The analysis and documentation almost immediately resulted in the identification of parallelism. We quickly agreed on an approach to parallel implementation. TfC works with hybrid unstructured grids, which are constructed by filling space with an arbitrary combination of elements of four predefined types (see Figure 3). (Any such combination is a legal grid in TfC, but not necessarily a good one. Grid geometry strongly affects the quality of the simulation results and therefore is the responsibility of experienced engineers.) Parallel implementation of TfC follows the SPMD (single-program, multiple-data) model. That is, the same algorithm executes on each node, and each process is assigned a partition of the data (the problem description). The grid is subdivided into (disjoint) partitions of grid nodes, which are assigned to the tasks of the parallel program. This results in some elements to be “cut,” as Figure 4 illustrates. A task P processes all elements that have at least one node in its (core) partition p; that is, cut elements are processed by more than one task.

GLOBAL STRATEGIC PLAN AND IMPLEMENTATION OF SEMS We implemented ParTfC by subsequently adding parallel components to the sequential code. The decomposition into SEMs follows this sequence: • • • •

Partitioning, Interpartition communication, Parallel finite-volume discretization, and The parallel AMG.

Partitioning We implemented partitioning by integrating the public-domain, general-purpose, graph-partitioning package Metis into the code.4 We chose Metis because its algorithms have received good ratings5 and because it is available as public-domain software.

Interpartition communication

After each solver sweep, values φi must be communi68

cated across partition boundaries. (φ is a placeholder for any physical property computed in ParTfC—for example, presPyramid sure and velocity (φ = (ux, uy, uz, p)T) or temperature (φ = T).). At initialization, each task Pi has to determine the lists of nodes, Ri←j, whose values it needs to receive from its neighbors Pj. The lists of nodes whose values have to be sent, Sj→i, are determined by communicating the Ri←j lists to the neighbor tasks (Sj→i = Ri←j). Nodal values φ and gradients ∇φ are packed into messages sorted by global node numbers. (Gradients are communicated only on the AMG’s completion—that is, once per coefficient loop.) A message does not include node numbers, because they can be deduced from the node lists.

Parallel finite-volume discretization TfC implements its finite-volume discretization in an element-based manner. We parallelized discretization by having each task process all elements that have at least one node in its partition. The discretization procedure results in a local stiffness matrix whose coefficients represent the influence that the element’s nodes have on each other. The coefficients are assembled (that is, summed up) in a global stiffness matrix A that represents the system of linear equations that the AMG subsequently solves. A is very sparse and is structurally symmetric. Task P holds the lines ai for all grid nodes i in its partition p. For elements in the overlap region, only coefficients ai,j are stored for which node i is in the local partition p.

The parallel AMG solver

Task P’s AMG solves the equation system A(p)φ(p) = b(p) with

∑ ai, jφ i = bi

1≤ j ≤ N

φi = φi′

for nodes i in p for nodes i in the overlap region

where N is the number of grid nodes, φk is the value of the physical property φ at grid node k, and φk′ is the φk value at overlap node k that has been received most recently from the neighbor task. In ParTfC, the AMG’s smoother (a modified ILU0 (Incomplete Lower Upper) decomposition) acts only on the local partition of the matrix. So, processes do not IEEE Concurrency

.

interact inside the smoother. After each sweep of the smoother, φ values for the overlap region are communicated between neighboring processes. Coarse-grid generation occurs the same way as in the sequential program, except that no coarse-grid blocks are formed that cross partition boundaries. Establishing coarse-grid connectivity across partition boundaries requires additional interaction between processes. As the coarse-grid topology changes dynamically (depending on the coefficient values), topology information for communication must be determined in every iteration of the coefficient loop for all coarsegrid levels. This requires information about the coarsegrid structure to be exchanged across partition boundaries. If the number of nodes at a given grid level falls short of a certain number (which can be set as a parameter), ParTfC switches to a direct solver that executes sequentially.

(from 20,000 to 150,000 grid nodes) and one to six processors, the tests revealed that

THE PARTFC PROTOTYPE

We’ve collected detailed profiling information to separately evaluate the performance of ParTfC’s modules (CFD computation, file I/O, administration, and so on). For the CFD module, we’ve separately profiled finitevolume discretization and the AMG. The test cases used in our first measurements are

• For six processors, the average wall-clock efficiency per time step increased from 60% for the smallest grid to 80% for the largest grid. We define wall-clock efficiency as Ew = nTi/Tn, where Ti is the wall-clock execution time on i processors and n is the number of processors. The average is taken over the number of time steps required for convergence. • The amount of extra memory required for parallel execution scales with the size of the overlap regions. For the finest grid, 13% of all nodes are in the overlap region (see Figure 4). • Parallelization does not degrade convergence. The number of time steps that TfC and ParTfC need to reduce the residual below a given threshold typically differs by only one or two. Real-world simulations require approximately 70 time steps.

We presented the first prototype of ParTfC at the TASCflow users’ conference in May 1996. It implemented parallel finite-volume discretization, but did not yet exchange data inside the AMG solver. That is, the linear solver worked in a block-Jacobi-like manner. Computation converged for all test cases, but at a considerably slower rate than the sequential program. We’ve detailed this PVM-based implementaP1 P2 tion of ParTfC elsewhere.6 Based on our experience with ParTfC, we have iteratively updated and revised our software-engineering guidelines. A fully parallelized ParTfC prototype including the parallel AMG is now available. It is implemented based on PVM, but ports to MPI and Parix are also available. Parallel execution is completely transparent to the user. The user does not have to care about starting and terminating PVM daemons. Also, all preand postprocessing tools can work with both the sequential and the parallel version of TfC. Partition Overlap First benchmark tests on an IBM SP2 Node show good parallel-performance results. P3 P4 Element (However, comprehensive profiling with an appropriate suite of test cases on different hardware platforms has just Figure 4. Partitioning unstructured hybrid grids in ParTfC, for the 2D started.) For a test case using three grids case. July–September 1997

69

.

small compared to the problems ParTfC is designed to solve. These larger problems have more than 1,000,000 grid node; therefore, they cannot be solved sequentially on a typical workstation, because of lack of main memory. Our measurements indicate that efficiency improves as the number of grid nodes increases. So, we are optimistic about ParTfC’s performance in production use.

Newer versions of Fortran For several decades, Fortran has dominated scientific computing. Fortran 77 has many disadvantages regarding state-of-the-art software-engineering technology. These disadvantages include a restricted set of data structures, no dynamic memory management, limited control constructs, and very limited support for data encapsulation and modular design. Newer standards such as Fortran 90 and HPF (which is based on Fortran 90) offer ways for better software development. Therefore, one of Sempa’ goals has been to reimplement the TfC code in Fortran 90. We redesigned the data structures with respect to dynamic memory management and complex array operations. This redesign has improved the code’s readability and structure. HPF also supports the development of SPMD programs. We have investigated the possibility of developing an HPF version of TfC. With the current state of HPF compilers, though, only programs containing regular data structures (that is, arrays) can be parallelized successfully.

Object-oriented scientific computing Object-oriented analysis, design, and programming are common software-engineering techniques in computer science but not in traditional scientific computing. Fortran 77 was developed to address the efficient arraybased computations that scientific-computing applications require. Today, most of these applications are still implemented in Fortran 77. There are various reasons for this: legacy Fortran 77 code has to be managed, compatibility between software modules is often necessary, existing efficient Fortran 77 scientific libraries must be used, and—last but not least—the learning effort to adopt object-oriented software development scares people. The use of Fortran 77 implies a procedural softwaredevelopment technique. Object-oriented software engineering can improve the quality of scientific-computing 70

software by increasing reliability, enhancing maintainability, and therefore shortening the software product’s development time. Yet, object-oriented software engineering still has to prove its applicability to scientific computing. It must provide the efficient software implementation on which computationally intense scientific-computing applications rely. The Sempa project has worked toward these goals by applying object-oriented software-development techniques to TfC. TfC consists of the discretization part and the AMG, which are both candidates for an object-oriented redesign. First results of a C++ redesign of the AMG show increased quality of the C++ AMG code. The C++ code is easier to read and understand; it is much closer to the pseudocode we used in our design document to describe the behavior of routines; and we’ve implemented data encapsulation. The increase in quality is due to a pseudocode-like implementation resulting from objectoriented techniques.7 Such an implementation will let numerical specialists concentrate on the implementation of a numerical feature or method instead of managing memory in workspaces. The object-oriented pseudocode-like implementation is robust in the case of future changes, because of the three basic features of object-orientation: encapsulation, polymorphism, and inheritance. This distinguishes this AMG implementation from a (theoretical) procedural pseudocode-like implementation. Unfortunately, execution time increased disastrously. However, this increase was implementation-dependent and not related to the use of object-oriented techniques. Most of the proposed optimizations have been implemented and show promising execution times.8 TfC’s other part, finite-volume discretization, is about to be redesigned in C++, too. Eventually, we will achieve an object-oriented parallel TfC.

Resource management for NOWs An automatic resource-management system for NOWs should include functions for batch queuing and load balancing to utilize idle workstations and minimize the runtime of parallel applications.9 In the Sempa project, we’ve designed and implemented the Sempa Resource Manager, which maps parallel applications to idle or low-loaded hosts, controls the processes of parallel applications, and migrates processes of a parallel application at runtime if the load situation is imbalanced. To implement the resource IEEE Concurrency

.

manager’s basic functionalities, we’ve used existing components: Codine, a batch-queueing system for NOWs, handles job scheduling. CoCheck10 handles checkpointing and task migration. It lets the resource manager migrate individual tasks at runtime, from a host claimed for interactive use, to an idle host. The Sempa Resource Manager is available only for PVM applications because a special PVM feature, the PVM resource-manager interface, joins the functions of Codine and CoCheck. Running an application under the resource manager’s control requires no source-code modification; relinking the program with a specific library is sufficient. We’ve used ParTfC to validate the Sempa Resource Manager’s correctness and efficiency. A further objective in Sempa is to develop an application-oriented approach for load balancing. Such an approach uses information of the parallel application (for example, the time needed to complete an iteration, or the load values the application can get using system calls) for migration decisions and load migration. Load migration in this case means process migration with user-defined checkpoints or repartitioning of the computational grid.

cessfully applied to real-world scientific-computing applications. The C++ implementation will be ported to Java. The Java version will be a case study for future research addressing high-performance network computing.

ACKNOWLEDGMENTS The German Federal Department of Education, Science, Research, and Technology (BMBF, http://www.bmbf.de) funded this work. The Sempa project’s Web page is at http://wwwbode.informatik. tu-muenchen.de/parallelrechner/applications/sempa/.

REFERENCES 1. “HPCN at Large for Industrial Application: The ESPRIT Initiatives EUROPORT, EUROPORT-D1995”; http://www. gmd.de/SCAI/europort/. 2. M.J. Raw, “A Coupled Algebraic Multigrid Method for the 3D Navier-Stokes Equations,” in Fast Solvers for Flow Problems, Proc. 10th GAMM-Seminar, Notes on Numerical Fluid Mechanics, Vol. 49, Vieweg Verlag, Wiesbaden, Germany, 1995, pp. 204–215. 3. F. Unger et al., “A Framework for Design Documentation in Interdisciplinary Scientific Computing Software Projects,” Tech. Report TUM-96-04, Technical Univ. at Munich, Inst. for Informatics, Munich, Germany, 1996; http://wwwbode.informatik. tu-muenchen.de/Par/appls/info/proj/sempa/publications/ TUM-96-04.ps.gz.

I

n designing ParTfC, we have concentrated on increasing its computational power. The next step is to consider visualization. We will replace the traditional approach of post-processing—that is, all tasks write their results to a single file—with a more interactive approach. This research is beyond Sempa’s scope but will be addressed in a future project. As soon as we’ve implemented the different approaches in resource management (resource-managed ParTfC with CoCheck checkpointing versus application-level checkpointing), we’ll compare their efficiency and portability, using ParTfC. The redesign of AMG in C++ has shown that objectoriented software-development methods can be sucJuly–September 1997

4. G. Karypis and V. Kumar, “METIS: Unstructured Graph Partitioning and Sparse Matrix Ordering System,” Univ. of Minnesota, Minneapolis, Minn., 1995; http://www.cs.umn.edu/ ~karypis/metis/metis.html. 5. R. Diekmann and R. Preis, “Statische und dynamische Lastverteilung für parallele numerische Algorithmen” (static and dynamic load distribution for parallel numerical algorithms), in Software Engineering in Scientific Computing, W. Mackens and S.M. Rump, eds., Vieweg Verlag, 1996, pp. 128–134. 6. P. Luksch et al., “Parallelization of a State-of-the-Art Industrial CFD Package for Execution on Networks of Workstations and Massively Parallel Processors,” Proc. EuroPVM 96: Third European PVM Users’ Group Meeting, Lecture Notes in Computer Science, Vol. 1156, Springer-Verlag, Berlin, 1996; http://wwwbode.informatik. tu-muenchen.de/archiv/artikel/europvm96/ europvm96.ps.gz. 7. M. Weidmann, “Object-Oriented Redesign of a Real-World Fortran 77 Solver,” in Modern Software Tools for Scientific Computing, E. Arge, A.M. Bruaset, and H.P. Langtangen, eds., Birkhäuser, Basel, Switzerland, 1997.

71

.

How to Reach IEEE Concurrency Writers For detailed information on submitting articles, write for our Editorial Guidelines ([email protected]), or access http:// computer.org/concurrency/edguide.htm. Letters to the Editor Send letters to Managing Editor IEEE Concurrency 10662 Los Vaqueros Circle Los Alamitos, CA 90720 Please provide an e-mail address or daytime phone number with your letter. On the Web Access http://computer.org/concurrency for information about IEEE Concurrency. Subscription Change of Address Send change-of-address requests for magazine subscriptions to [email protected]. Be sure to specify IEEE Concurrency. Membership Change of Address Send change-of-address requests for the membership directory to [email protected]. Missing or Damaged Copies If you are missing an issue or you received a damaged copy, contact [email protected]. Reprints of Articles For price information or to order reprints, send e-mail to [email protected] or fax (714) 821-4010. Reprint Permission To obtain permission to reprint an article, contact William Hagen, IEEE Copyrights and Trademarks Manager, at [email protected].

72

8. M. Weidmann, “Design and Performance Improvement of a Real-World Object-Oriented C++ Solver with STL,” to be published in Proc. 1997 Int’l Scientific Computing in Object-Oriented Parallel Environments Conf. (Iscope ’97), Springer-Verlag, 1997. 9. U. Maier and G. Stellner, “Distributed Resource Management for Parallel Applications in Networks of Workstations,” Proc. HPCN Europe 1997, Lecture Notes in Computer Science, Springer-Verlag, 1997, pp. 462–471; http://wwwbode.informatik.tu-muenchen. de/archiv/artikel/hpcn97/hpcn97.ps.gz. 10. G. Stellner and J. Pruyne, “Providing Resource Management and Consistent Checkpointing for PVM,” Proc. Second European PVM Users’ Group Meeting, Editions Hermes, Lyon, France, 1995, pp. 131–136; abstract of article at http://www.cs.cmu.edu/ afs/cs/project/nectar-adamb/pvm95/stellner.html.

Peter Luksch heads the parallel and distributed applications research group as the Chair for Computer Technology and Computer Architecture at the Technical University at Munich. He is also a member of Special Research Grant 342, “Tools and Methods for Using Parallel Computer Architectures,” funded by the German Science Foundation. His research interests include parallel and distributed computing and software-engineering methods for parallelization of large-scale applications in scientific computing. He received his PhD in computer science from the Technical University at Munich. Contact him at LRR-TUM, Institut für Informatik, Hauspost: SAB, D-80290 München, Germany; [email protected]; http:// wwwbode.informatik.tu-muenchen.de/~luksch/. Ursula Maier is a PhD student in computer science at the Technical University at Munich. Her research interests include resource management and load balancing in networks of workstations, parallelization of scientific-computing applications, and parallel computation in high-speed networks. Contact her at LRR-TUM, Institut für Informatik, Hauspost: SAB, D-80290 München, Germany; maier@ informatik.tu-muenchen.de; http://wwwbode.informatik.tu-muenchen. de/~maier/. Sabine Rathmayer is a PhD student in computer science at the Technical University at Munich. Her research interests include automatic and interactive parallelization and tools, High Performance Fortran, and online visualization and computational steering of parallel HPC applications. Contact her at LRR-TUM, Institut für Informatik, Hauspost: SAB, D-80290 München, Germany; [email protected]; http://wwwbode.informatik.tu-muenchen.de/~maiers/. Matthias Weidmann is a PhD student in computer science at the Technical University at Munich and a research assistant in the Sempa Project. His research interest is in object-oriented high-performance computing. He received his Master of Computer Science from the Technical University of Munich. Contact him at LRR-TUM, Institut für Informatik, Hauspost: SAB, D-80290 München, Germany; [email protected]; http://wwwbode. informatik.tu-muenchen.de/~weidmann/. Friedemann Unger is a software developer in computational fluid dynamics at AEA Technology GmbH, where his main responsibility is AEA’s contribution to the Sempa project. He received his D.-Ing. in mechanical engineering from the Technical University of Munich. Contact him at AEA Technology GmbH, D-83624 Otterfing, Germany; [email protected].

IEEE Concurrency

Lihat lebih banyak...

SEMPA: software engineering for parallel scientific computing

Descripción

Comentarios