P-RIO: An Environment for Modular Parallel Programming

Share Embed


Descripción

P-RIO: An Environment for Modular Parallel Programming Enrique Vinicio Carrera E. Coppe - UFRJ Cx. Postal 68511 Rio de Janeiro - RJ, 21945-970 Brazil [email protected]

Orlando Loques CAA - UFF Rua Passo da Pátria, 156 Niterói - RJ, 24210-240 Brazil [email protected]

Julius Leite CAA - UFF Rua Passo da Pátria, 156 Niterói - RJ, 24210-240 Brazil [email protected]

Abstract This paper presents the P-RIO environment which offers high level, but straightforward, concepts for parallel and distributed programming. A simple software construction methodology makes most of the useful object oriented programming technology properties available, facilitating modularity and code reuse. This methodology promotes a clear separation of the individual sequential computation components from the interconnection structure used for the interaction between these components. The mapping of concepts associated to the software construction methodology to graphical representations is immediate. P-RIO includes a graphical programming tool, has a modular construction, is highly portable, and provides runtime support mechanisms for parallel programs, in architectures composed of heterogeneous computing nodes. Keywords: Distributed and parallel computing, object oriented programming, code reuse, graphical programming. Introduction Parallel and distributed architectures are essential to support high-performance computer applications. At the processor and intercommunication hardware levels great progress has been achieved. Presently, many powerful multicomputer architectures, based on high speed intercommunication technologies, are available. In order to exploit parallelism, the software of these systems is based on sequential pieces of computation that act concurrently and interact for communication and synchronization. In most message passing based programming environments, the interactions are specified through explicit language constructs embedded in the text of the program modules. As a consequence, when the interaction patterns are not trivial, the overall program structure is hidden, making difficult its understanding and performance optimization. In addition, there is little regard for properties such as reuse, modularity and software maintenance that are of great concern in the software engineering area. In this context, P-RIO† tries to offer high level, but straightforward, concepts for parallel and distributed programming. A simple software construction methodology makes most of the useful object oriented programming technology properties available, facilitating modularity and code reuse. This methodology promotes a clear separation †

P-RIO stands for Parallel - Reconfigurable Interconnectable Objects

of the individual sequential computation components from the interconnection structure used for the interaction between these components. Besides the language used for the programming of the sequential components, a configuration language is used to describe the program composition and its interconnection structure. This makes the data and control interactions explicit, simplifying program visualization and understanding. The mapping of concepts associated to the software construction methodology to graphical representations is immediate. Hence, P-RIO includes a graphical tool that provides system visualization features, that help to configure, monitor and debug a parallel program. In principle, the methodology is independent of the programming language, operating system and of the communication architecture adopted. The support environment has a modular construction, is highly portable, and provides development tools and run-time support mechanisms for parallel programs, in architectures composed of heterogeneous computing nodes. Our current implementation is message passing oriented, what had strong influence in the presentation of the paper. However, we also provide some insights on its application when considering a distributed shared memory environment. P-RIO Concepts The P-RIO methodology hinges on the configuration paradigm [1, 2] whereby a system can be assembled by externally interconnected modules obtained from a library. The concept of module (figure 1) has two facets: (i) a type or class if it is used as a mould to create execution units of a parallel program; (ii) an execution unit, called a module or class instance. Primitive classes define only one thread of control, and instances created from them can execute concurrently. Composition caters for module reuse and encapsulation; that is to say, existing (primitive and composite) classes can be used to compose new classes (figure 2). Configuration comprises the selection and naming of class instances and the definition of their interconnections. in ports module out ports Figure 1. A Typical Module Icon The points of interaction and interconnection between instances are defined by named ports, which are associated with their parent classes. An in port (figure 1) defines a passive partner in an interaction, while an out port defines an active partner that can initiate interactions. Ports themselves are classes that can be reused in different module classes. A port instance is named in the context of its owner class. This provides configuration level naming independence helping the reuse of the port classes. The external interface of a composite class is formed by the ports that were explicitly exported by its internal classes. This helps to hide the internal composition of a class making only visible the ports useful for external configuration.

pi application

work[0]

parallel work[1]

Figure 2. An Example of Composite Module Transaction Styles Closely associated with each port is a transaction style. We use the term transaction to name a specific control and data interaction rule between ports. Besides the interaction rule, the style also defines the implementation mechanisms required for the support of the particular transaction. Typical unidirectional and bi-directional message passing transactions are immediate examples of styles. The configuration description language provides a link construct that is used to specify the connection of ports belonging to different modules. The port concept allows the connections to be checked for consistency (transaction style matching) at the configuration domain. Our basic transaction set includes a bi-directional synchronous (remote-procedurecall like) and a unidirectional asynchronous (datagram-like) transaction style. The synchronous transaction can be used in a relaxed fashion (deferred synchronous), without blocking, allowing the overlap of remote calls with local computations. The associated responses can be collected later on, in a different order of that of calling. This basic transaction set can be extended or modified in order to fulfill different requirements of parallel programs. P-RIO first version was implemented using the standard PVM library [3]. Currently, we are considering a MPI based implementation. In this case, the different semantic modes (defined by the underlying protocols) associated to MPI point-to-point transactions could be selected at the configuration level. Other transaction styles, as for example, the highly optimized RPC mechanism described in [4], based on active messages, could be included in our environment and selected for specific port connections. It should be noted that the transaction concept does not imply commitment to lowlevel communication architectures. For example, transactions between modules located in the same node can move data through direct memory to memory transfer. In our PVM based implementation, it is permitted to use UDP or TCP protocols, besides multicast dissemination, in the support of the transactions. In architectures based on specialized interconnection structures, as for example the IBM-SP/2, their particular communication mechanisms could be used for transaction support. Configuration Language Figure 3, shows a master-slave architecture typically found in parallel programs and its description using the configuration language. Data is a port class that defines an RPClike transaction; int[2] and double are the data types used for the in and out call arguments, respectively. Calc_Pi is a composite class that uses the primitive classes Master and Slave; it defines enclosed instances of these classes (master and slave) as well as their ports and connections. Note that the ports named input and output[i] are of the Data class. The last line specifies the creation of an instance of Calc_Pi, named calc_pi with a variable number of slave instances. The

declarative system description is submitted to a command interpreter that executes the low level calls required to create the specified parallel program. The interpreter queries the user to provide the number of replicated instances to be actually created in the system. The instances are automatically distributed to the available set of processors. However, the user can override this allocation defining specific mappings at the configuration level. The configuration language interpreter uses a parser written in Tcl [5] and supports flow control instructions that facilitate the description of complex configurations [1]. In a parallel execution environment it is necessary to isolate different programs. In our proposal, the class instances that compose a program can interact only with instances included in the same naming domain, which is defined by its main class name. The module classes, however, can be used to configure different programs. This allows a library composed by a set of module classes to be used by different parallel programs. The sharing of the library code depends on the particular run-time implementation. slave[0]

master

slave[1] slave[2] slave[3]

port_class Data { sync, int[2], double} class Calc_Pi { N } { class Master { N } { code C “master $N”; port out Data output[$N]; } class Slave { } { code C “slave”; port in Data input; } Master master $N; Slave slave[$N]; forall i $N { link master.output[$i] slave[$i].input; } } Calc_Pi calc_pi variable;

Figure 3. An Example of the Configuration Language Group Transactions Group communication abstractions fit well in the configuration model (figure 4). A group can be seen as an abstract module that receives inputs and distributes them according to a particular transaction style. In practice, we use a simplified representation for group configurations and the implementation of the associated transaction style will depend on the particular communication substrate available in the support environment.

master[0]

slave[0] group slave[1]

master[1]

slave[2]

Figure 4. A Communication Group In our model, groups are explicitly named for connection purposes and several collective message passing transaction styles can be selected at the configuration domain. Configuration level primitives are available to create and name groups, as well as to specify port to group connections. Our current implementation supports two multicast group transaction styles: unidirectional and bi-directional, for use with the asynchronous and synchronous transactions, respectively. It is noteworthy that the popular all-gather, scatter and gather and all-to-all collective data moves can be implemented using the standard features of P-RIO. All-gather can be achieved through unidirectional group transactions. Scatter and gather can be implemented by synchronous transactions associated to a set of ports; one for each target. Finally, allto-all is similar to scatter and gather but uses asynchronous transactions. Alternatively, for performance reasons, a specialized library as MPI could be used to support these transaction styles [6]. Barrier synchronization mechanisms can also be implemented using a group transaction through ports. However, we choose to offer a specific programming level primitive for this function, in order not to burden the programmer with port concerns. For the same reason, barrier specifications appear in a simplified form at the configuration level. In our project we experimented with three basic approaches to provide group transaction support: (i) it was implemented from scratch and embedded in the run time support; (ii) it was achieved by special support modules interposed between the involved application modules; (iii) we just used the resources of a run-time support library, such as PVM. Although the approaches sometimes overlapped the best choice is just a question of trading off between efficiency, flexibility and implementation cost.

C[0]

C[1]

S

C[2]

(a) Configuration Structure

C[0]

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA S AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAA AAAA

C[1]

AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA S AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAAAAAA

C[2]

AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAAAAAA S AAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAAAAAA

Coherence Protocol

(b) Implementation Structure

Figure 5. Distributed Shared Memory Transaction

Shared Memory Transactions In previous sections, we have show how the configuration paradigm provides a flexible framework for the construction of message passing based parallel programs. In this section, we present one scheme that we adopted to introduce the distributed shared memory (DSM) paradigm in the context of P-RIO. This provides a hybrid environment, helping the programmer to mix the two paradigms in order to implement the best solution for a specific problem. In figure 5-a, lets consider that module S can perform heavy processing on a data structure on behalf of any module C[i]. The latter concurrently compete to access these processing services through procedure call transactions associated to ports visible at S interface (for simplicity, only one port is presented in figure 5). Thus, as an implementation option, S and its encapsulated data structures can be replicated at each hardware node where there is a potential user of its services. Thus, the procedure code can be executed locally, providing a simple computation and data migration policy. A procedure that performs read-only data operations does not need to execute any synchronization code. However, if the procedure performs updates to S data, they must be implemented using a shared memory coherence protocol. Configuration level analysis, similar to compile level analysis used in Orca [7], could determine that module S is to be replicated. This could also be done by the programmer, using knowledge regarding program behavior and structure. According to our programming model each module instance acts as a monitor; a port defines a monitor entry point. Thus, each procedure call transaction can be associated to a mutual exclusion region and, in consequence, is equivalent to acquire primitives used in DSM libraries, such as Munin or TreadMarks. A special code associated to each port can be used to trigger the required underlying consistency protocol. It is also possible to use explicit primitives, such as acquire and release, internally to the module code. This allows more flexibility at the expense of program structuring. It is interesting to note that the data sharing protocol and the modules to processors mapping can be specified at the configuration level, allowing great flexibility for system tuning. Graphical System Composition P-RIO includes a graphical interface that helps to configure, visualize, monitor and debug parallel applications. The graphical representation of each class (icon), including their ports, and of a complete system is automatically created from its textual configuration description (figure 6). Alternatively, using the graphical tool, a system can be created by selecting classes from a library, specifying instances from them and interconnecting their communication ports. Menus are used to change defaults for port connections and special icons are available to represent different group communication styles. Modules using barrier synchronization can be identified by a special mark, and a set of modules using the same barrier can be stamped out clicking this mark. The encapsulation concept is supported, permitting to compose graphically new module classes from already existing classes. Class compression and decompression features allow the designer to inspect the internals of composite classes and simplify the visual exposition of large programs. With this tool, it is also possible to control the physical allocation, instantiation, execution and removal of modules, with simple mouse

commands. For example, using these features, one can reconfigure the application on the fly by stopping modules, reallocating them, or changing several other parameters. Program configuration changes at the graphical interface level are automatically reflected at the textual representation, that can be saved for reuse. The system does not attempt to optimize the representation of repetitive program structures. However, the user can obtain this effect by editing the automatically generated textual representation.

Figure 6. Snapshot of P-RIO Graphical Interface The graphical interface borrows many features of the XPVM‡ tool [3], contained in the PVM system. Thus, it supports debugging primitives for displaying messages sent and print-like outputs. This interface was implemented using the Tcl/Tk toolkit [5], being very portable as well. Neural Classifier AAAA AAAAAAAAAAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAAAAAA

Preprocessing

Image

Figure 7. Image Recognition System Using Artificial Neural Networks ‡

XPVM has copyrigth of the Oak Ridge National Laboratory, Oak Ridge - TN, U.S.A., 1994

An Example As an example, we present the use of P-RIO to parallelize the implementation of an experimental image recognition system (figure 7). It was used a neural network classifier and preprocessing techniques that make the result independent of the relative position and scale of the target image. The preprocessing stage uses common digital processing techniques (filtering, transforms and convolutions) that imply on high processing demands as well as high rate of repetition of operations over the bidimensional matrix that represents the image. The image is represented by a matrix of 256x256 pixels, with 256 gray scale levels. To reduce spurious noise a median filter is applied on the image, followed by a border detection algorithm to obtain the object position within the vision field. In parallel, homomorphic processing is applied to the original matrix in order to avoid distortions that can be caused by the light source. The outputs of the two stages are used to generate a polar coordinate representation of the image; and in the sequence, circular harmonic coefficients are obtained. Finally, the Mellin transform is applied to obtain a 128x512 matrix of complex coefficients, that are partially presented to the neural classifier. Figure 8 presents the main functions performed by the preprocessing phase. Considering the sequential implementation of this system we identified two points for performance improvement: 1. In the image preprocessing phase the edge detection task takes 70% of the total machine time. This task could be subdivided in smaller tasks, that can be executed in parallel, taking advantage of the properties of the adopted bi-dimensional signal processing algorithm. This parallelization requires small changes in the original program. The neural net has 16384 inputs and two layers with 100 and 5 neurons, respectively. Both the learning and recognition tasks take a long CPU time. The neural net was subdivided in a number N of sub-blocks each one containing a subset of the neurons. The N blocks can be executed in parallel being sequentialized when there is processing between the different layers. This scheme works well for neural nets with a large number of neurons in each layer. However, it is not useful when the layers have a small number of neurons. This happens because the overhead incurred by message passing is greater than the time spent in the processing in each partition of the net. AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA Homom. AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA Process AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAAA AAAA

AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA Polar AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA Coorden. AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAA AAAA

AAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAA AAAAAAAAAAAA AAAAAAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA Median AAAAAAAAEdge AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAAFilter AAAA Detection AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAA AAAA Circular AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAA AAAA Harmonic AAAAAAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAA Coef. AAAAAAAA AAAA AAAAAAAAAAAA

Preprocessing

AAAAAAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAA AAAA Mellin AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAA AAAA AAAAAAAA AAAA Transf. AAAA AAAAAAAA AAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAAAAAAAAAA

Figure 8. The Preprocessing Phase of the Image Recognition System Figure 9 presents the module configuration that implements the system. The median and edge detection functions are encapsulated in one composite class called center;

in this class, edges is the basic unit of parallelization. It is interesting to note that the number of instances of edges is specified by a parameter in the configuration description text. This provides a convenient means to tune the performance of the system. The other functions of the preprocessing stage are encapsulated in a single module, represented as g_mellin, in figure 9. However, due to its internal structure it can perform the homomorphic processing function in parallel with the edge detection function. The pipelined transformations included in g_mellin are quite simple and were not split up in individual modules. Although this could be easily done, the speed up that could be obtained in a network of workstations would not be relevant. The internal structure of the parallel neural net implementation is presented in figure 10. A snapshot of the graphical interface of P-RIO, showing the system main composite classes is presented in figure 6.

neural[0]

...

vector

g_mellin

net

neurals

neural[n] matrix image edges[0]

...

...

median

edges[n] center

Figure 9. Image Recognition System Performance Measures For the image recognition application, quantitative measures that show the speed-up for the border detection function, performed in the preprocessing phase, are presented in table 1. The sequential preprocessing time of 168 secs was reduced to 36 secs in the parallel version using 5 Sun4 workstations. In this application, the communication costs are very low allowing an almost linear speed-up. In the same application, the classification phase takes 4 minutes in one processor. A parallel version, using 5 processors, reduced this time to almost two minutes. In this implementation the neural network weights were stored in standard file system (SunNFS) files. This introduces a bottleneck when the replicated modules try to access concurrently these files. This could be improved using an I/O system with parallel operation support. P-RIO performance was also compared to pure PVM. We measured the round-trip time for similar message passing operations and the execution times of a set of

programs implemented in both environments. For all the experiments, the measures obtained were almost identical, showing that P-RIO does not introduce significant overheads for parallel programming based on workstations connected by a local area network. Number of processors Time (secs) Speed-up 1 168 1.0 2 85 1.9 3 62 2.7 4 44 3.8 5 36 4.6 Table 1. Speed-up for the Preprocessing Phase

Figure 10. Snapshot of the Neural Network Module Programming and Configuration Support The programming and configuration abstractions of P-RIO are supported by a simple run time platform. To start transactions through ports, basic send and receive primitives, similar to those commonly found in parallel programming libraries, are available. Using these primitives the programmer does not need to worry with lowlevel details like addressing, buffer allocation and flag management. P-RIO also offers an ADA-like select-with-guard communication constructor based on logic expressions. This constructor allows a module to wait for messages coming from a selected set of ports; the specific port to receive a message is chosen in a non-deterministic way. We also offer a data-flow like constructor that can be used to receive atomically messages coming from a set of ports. In addition, auxiliary temporization functions for communications (time-outs) and a mechanism for propagation of exceptions between communicating modules are available. The use of these mechanisms is optional. However, it should be noted that they can be used to include error recovery or error reporting features in parallel programs. This can be essential to achieve the reliability

levels required to use parallel techniques in real world applications. Our current implementation of the support primitives is based on C. However, the P-RIO primitives for configuration and interaction are compatible with a wide range of procedural and object oriented languages; their invocation sintaxes are easily mapped over these primitives. A small set of programming level configuration primitives is required to define the structure of a processing module, including its set of ports. These primitives are embedded in the modules sequential programs and are required to create part of the run-time context needed to support the corresponding high level abstractions. An extra set of configuration primitives is available to support the high level constructs of the configuration language. These primitives are called by the configuration language interpreter to impose the particular module and interconnection structure of a program. Related Work There are many implementations of parallel programming environments. We restricted our discussions to some relevant proposals that adopt a two-step program composition methodology and include a graphical interface. These two features are basic in our proposal. Computationally Oriented Display Environment (CODE) allows the programmer to draw a graph and insert textual annotations associated to nodes (circles that represent sequential programs) and arcs (connections) [8]. A flexible data-flow style of programming is adopted. Input and output named ports (queues of data, similar to those of P-RIO) are used to define each node interface. The node ports do not explicitly appear at its interface and an annotation associated to each arc is used to define the binding of an output port to an input port. To each node is associated a sequence of annotations that define the type and name of ports, the input firing rules, the code to be activated and the output routing rules. These annotations allow the programmer to define explicitly the dynamic interaction behavior of each node. In its basic aspects P-RIO is similar to CODE. However, P-RIO input and output ports appear explicitly at the module interface giving more details of the program structure. As noted in [8], this can simplify the graphic representation of many programs. In addition, the port transactions can be directly mapped to method or procedure calls, in most languages, giving a clear picture of the functions provided by each module. Hence is an experimental graphic language that uses special icons to represent control flow constructs. Interposing these icons in a graphic program, loops, parallel replication, pipelines and conditional execution of subgraphs can be specified. However, the language makes difficult to know the dynamic behavior of a program and the specification of some firing rules. A more complete description of Hence and CODE is available in [8]. Paralex is an environment based on a very restrict data-flow model [9]. The activation of a computational unit obeys the so-called “strict enabling rule” and a unit performs a multi-function mapping between a number of inputs and outputs. In addition, the computational units do not keep internal state and do not interact with other components. These restrictions make easy the support of fault-tolerance based on module replication and of a simple load balancing policy.

The P-RIO environment allows configuration changes to be performed on an already started system; this speeds up the experimentation with different versions of algorithms. A simple process migration policy based on redirection of messages is supported. The system does not embed support for dynamic load balancing or fault tolerance policies. However, policies not requiring state preservation, such as proposed in [9] could be easily supported. It is possible to embed the configuration control constructs in the code of the modules. This provides great flexibility for configuration control. Tracs is a graphic environment that allows the programmer to define messages (typed), computational modules (including its ports) and static architecture models through special windows and menus [10]. The approach tries to fill the gap that exists between the module programming and configuration programming levels. Features as these can be easily added to the environments we have previously commented and also to P-RIO, which already provides tools for system configuration. It is interesting to note that fixed length data blocks, associated to typed port messages, can be a burden when the modules interact moving dynamic sized data structures. P-RIO circumvents this problem offering a primitive that allows to setup the size to be used in a specific data interaction. Lastly, modules in Tracs have a single input queue and no firing rules; this can make difficult to implement some module interaction patterns. Visual Parallel Environment (VPE) is an environment that allows to create, compile and run PVM programs [8]. The applications (programs) are represented by graphs that consist of nodes and arcs. Computing nodes represent sequential tasks that use named ports for communication; ports may be wired through arcs. Call nodes are graphs compound by computing nodes and other call nodes that export their ports through special interface nodes. VPE exploits the concepts of modular and reusable components to assist the process of creation of a parallel application. On the other hand, P-RIO attempts to exploit modularity and reuse both in the construction and in the execution stages of parallel programs. In addition, useful transaction styles typically used in parallel programming are readily available in our environment. Our environment adopts the configuration paradigm proposed in Conic [2], for constructing distributed systems. Its main attractive is to facilitate the association of specific features to modules and data interaction channels. We extend the configuration framework including features for parallel programming. The use of a system description language has others attractives: the functional modules can be designed using an object oriented methodology, what incentive reuse and can be valuable for the development of complex applications; it makes the system amenable to formal verification techniques and also to program flow analysis techniques; the text is a portable and compact representation of the software that can be mapped to graphic representations (and vice-versa); the use of indexes and flow control constructs allows the specification of large program structures. Conclusions Our proposal incentives a software architecture view for application development. It is centered in the configuration paradigm that provides high-level and flexible abstraction mechanisms for program construction. This strategy has allowed us to pickup and put together several useful concepts (and associated mechanisms) for parallel computing, providing a “middleware” based programming and execution environment. The

available features can be quickly selected and used to configure parallel and distributed programs. The current version exploits the portability and interoperability between hardware architectures provided by PVM. Performance evaluation tests have shown that P-RIO does not introduce measurable overheads to message communication when compared to a pure PVM system. The original PVM visualization and debugging tools can be used also with P-RIO systems. In addition, P-RIO provides a graphical interface that supports an object based methodology for program construction, and allows to inspect and control program execution. Also, several extensions in the current environment and in its graphical interface can be introduced. The shared memory extensions are not yet integrated in the environment. We are planning to implement them using a DSM coherency protocol that minimizes latency by transferring updates in advance, for the processor most likely to need it. Another, earlier, version of our environment (named RIO) maps each computational node on a light weight process (thread) and is intended for distributed computing [1]. It supports group communication protocols with different failure semantics and a set of fault-tolerance techniques based on module replication. This experience has helped to convince us of the flexibility provided by the configuration approach. Using P-RIO and its graphical tool, a functional prototype for a parallel system can be quickly assembled. This facilitates experimenting with different solutions for a particular problem. We believe that engineers and scientists, with very little training, can build relatively complex systems using P-RIO. The current version of the run-time support environment, including the graphical tool, is operational since November of 1994 and has been used to build several experimental applications. Its code, manuals and additional documentation can be obtained at http://www.caa.uff.br/projects/p-rio. Acknowledgments Grants from the CNPq (Brazilian National Research Council), Faperj (Rio de Janeiro State Research Support Foundation) and Finep (Studies and Projects Funding Agency) have partially supported the project described here. We are also grateful to many of our M.Sc. students who worked on earlier versions of the environment and provided valuable feedback that has contributed to improve our present version. References 1. E. Carrera, P-RIO: Metodologia para a Construção de Sistemas Distribuídos Configuráveis sobre PVM, MSc Dissertation (in portuguese), Electrical Engineering Department, PUC/Rio, Rio de Janeiro, Brazil, 1996. 2. J. Magee, J. Kramer and M. Sloman, “Constructing Distributed Systems in Conic”, IEEE Trans. on Software Engineering, Vol. 15, No. 6, June 1989, pp. 663-675. 3. G. A. Geist et al., PVM: A Users’ Guide and Tutorial for Networked Parallel Computing, The MIT Press, Cambridge - Mass., U.S.A., 1994.

4. D. A. Wallach et al., “Optimistic Active Messages: A Mechanism for Scheduling Communication with Computation”, Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, Santa Barbara, Calif., U.S.A., July 1995, pp. 217-226. 5. J. Ousterhout, Tcl and the Tk Toolkit, Addison-Wesley, Reading, Mass., U.S.A., 1994. 6. J. Dongarra, S. W. Otto, M. Snir and D. Walker, “A Message Passing Standard for MPP and Workstations”, Communications of ACM, Vol. 39, No. 7, July 1996, pp. 84-90. 7. H. Bal, M. F. Kaashoek, and A. Tanenbaum, “Orca: A Language for Parallel Programming of Distributed Systems”, IEEE Trans. on Software Engineering, Vol. 18, No. 3, March 1992, pp. 190-205. 8. J. C. Browne, S. I. Hyder, J. Dongarra, K. Moore and P. Newton. “Visual Programming and Debugging for Parallel Computing”, IEEE Parallel & Distributed Technology, Spring 1995, pp. 75-83. 9. O. Babaoglu, L. Alvisi, A. Amoroso, R. Davoli, and L. A. Giachini, “Paralex: An Environment for Parallel Programming in Distributed Systems”, Proceedings of the 6th ACM International Conference on Supercomputing, July 1992, pp. 328342. 10. A. Bartoli, P. Corsini, G. Dini, and C. A. Prete. “Graphical Design of Distributed Applications Through Reusable Components”, IEEE Parallel & Distributed Technology, Spring 1995, pp. 37-50.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.