Energy Efficiency on Scalable Computing Architectures

July 9, 2017 | Autor: Daniel Sierra | Categoría: Computer Architecture, Grid Computing, Energy Consumption, Performance Evaluation, Power Consumption, Energy efficient, Computer Information, graphic processing unit (GPU), Energy efficient, Computer Information, graphic processing unit (GPU)

Share Embed

Laporkan tautan ini

Descripción

2011 11th IEEE International Conference on Computer and Information Technology

Energy Efficiency on Scalable Computing Architectures Carlos J. Barrios Hern´andez, Daniel A. Sierra HPCC Unit CEMOS Group Industrial University of Santander Bucaramanga, Colombia [email protected], [email protected]

Sebastien Varrette CSC Research Unit University of Luxembourg L-1359, Luxembourg [email protected]

Abstract—Nowadays, power consumption in computer systems is an active and important subject of discussion in both research and political communities. Indeed, increasing the performance of such computer systems frequently requires increasing the number of resources, thus leading to higher power consumption and a negative impact on the environment. Some strategies to reduce the energy consumption are based on hardware modifications, the usage of energy-aware software and new rules of computation resources utilization. This paper analyzes the energy consumption of HPC platforms and Grid computing infrastructures from two different perspectives: (i) cost in energy due to idle/active status and (ii) cost in energy due to data transfer.

power regarding the increments in activity. The authors focused on mobile and embedded devices to describe the dynamic behavior of power consumption. Furthermore, it was discussed the different possible infrastructure states (peaks, idle, normal use) that make the variation in power consumption applicable to scalable architectures differentiating data processing and data transfer. Considerations on power management and data transfers are not a new topic. Several works related to power management mainly concern Internet connections and business networks [6]. In a cloud infrastructure, the 30% of the total costs for a data center involve around 15% in power draw and 15% in networking for data transfer[8]. Abts et al. [1] have proposed energy proportional data center networks to treat power management in relation with network traffic. In particular, they show that there is a significant advantage of having independent control of each unidirectional channel comprising a network link, since many traffic patterns show very asymmetric use, and system designers should work to optimize the high-speed channel designs to be more energy efficient by choosing optimal data rate and equalization technology. The energy-proportional idea has been considered to propose strategies for managing power consumption in data transfers with schedulers implemented in hardware (networking devices) or specific middlewares [4], [11], [16]. As scalable systems run multiple workloads concurrently and simultaneously, often applications generate different runtime status difficult an efficient-energy job scheduling. Then, energy-proportional techniques involve management of workloads mainly in schedulers. Additional to the power consumption in data transfer, the power-consumption costs associated with processing are also important. Beside hardware modifications of physical components, the same heterogeneous workloads, meets power-consumption to a case or scheduling and optimization. Keller and Gruber [9] and Schulz [21] proposed several green high performance computing methods and strategies applied to scalable platforms. An interesting aspect in all cases is that solutions are proposed based on scheduling strategies, independently of the architecture. More precisely, in terms

Keywords-green computing; scalable architectures; efficient power consumption; performance of systems;

I. I NTRODUCTION Energy efficiency in computer systems has emerged as an important and active research field due to its implications such as system performance, carbon dioxide footprints, and operational cost. According with the 2010 Key World Energy Statistics Report [13] of the International Energy Agency, IEA [12], the world’s energy consumption in the last 30 years has almost doubled. For example, the energy consumption between 1973 and 2008 has increased by more than 80%. This increase is due to the high demand and use of electronic elements, such as computing devices. Electric power consumption on computer systems depends on direct and indirect factors, ranging from the technological characteristics of the computer system and the devices to ensure a suitable environment (i.e. to dissipate heat) to the interaction with other systems or humans (i.e. data transfer energy cost, user demand). HPC and Scalable architectures have good examples of efforts from different perspectives to address the problem of electric power consumption: code optimization, hardware optimization, virtualization, scheduling strategies, and others [8], [10], [21]. Barroso and H¨ozle [2] have described the case of energy-proportional design of computer systems to allow energy savings without degrading performance efficiency. Basically, energy proportional computing systems consume almost no power when idle and gradually consume more 978-0-7695-4388-8/11 $26.00 © 2011 IEEE DOI 10.1109/CIT.2011.108

Dino Lopez Pacheco I3S Laboratory Nice-Sophia Antipolis University Sophia Antipolis, France [email protected]

635

of software/applications, different approaches to reduce energy consumption while executing parallel applications efficiently investigate scheduling algorithms in both sharedmemory and distributed architectures. Carissimi et al. [5] discuss the impact of the energy consumption in different systems. The authors suggest that although the energy consumption may be determined in most cases, some interesting issues are still in place and should be addressed to improve the energy efficiency of the systems. These issues involve the use and management of idle resources, the influence of the workload variation, service requests and traffic (in terms of scheduling). The aim of this paper is to contribute in the study of energy efficiency on scalable architectures from a holistic perspective, based in three main points: first, we analyze idle and active status on the systems, observing processing and data transfer. Second, as this approach involves Cluster and Grid Computing platforms, implying HPC architectures, we observe high bandwidth networks with a demand variation. Third, we discuss differences between two processing technologies.

Figure 1.

HPC Architecture and Average of Energy Consumption

Power consumption due to operation of nodes, involving processing, storage and direct data transactions between processors and file systems is estimated in 25%. In this estimation, associated devices (i.e. fans or coolers) are not taken into account. However, power consumption can increase significantly, more than 50% if GPUs are part of the components of the system [17], [19]. The top of Figure 1 presents the power consumption related with the execution of the applications. Depending of the state generated for the application, the power consumption can be estimated from an unknown value (normally, low) until a 50%. Clearly, the execution of applications has an impact on the power consumption of the other levels, therefore the optimization of applications and efficient scheduling improves the ratio of performance of power consumption, in other words the power efficiency.

The rest of this paper is organized as follows: Section II presents an overview of Energy-Proportional costs in a typical HPC platform and its extension to a large scale architecture. Section III shows preliminary results of basic measures from an analytical model. Section IV shows a discussion centered in energy-aware and idle and active status of scalable architectures. Further work is presented in Section V and finally, the conclusions are shown in Section VI. II. E NERGY- PROPORTIONAL C OSTS IN L ARGE - SCALE A RCHITECTURES

A highly scalable architecture is a Grid Computing platform: [7]. In Grid Computing platforms, some specific characteristics have an impact on performance. However, the most common components in a Grid Computing impacting the ratio of performance and energy consumption are connections using high bandwidth networks, common collective services and middleware, the heterogeneity of sites and resources, the high and different workload requirements, volatility of resources, concurrency and geographical distribution.

Performances of any scalable system is sensitive to the increase of its components regarding the individual characteristics of each one of its elements. In terms of highly scalable architectures, these increases are associated with consumption of resources and a growth of load. Then, observing the energy proportionally, an increment in average workload implies a proportional increase of power consumption [2]. Observing a typical architecture of HPC platform, we can distinguish basic components levels. Figure 1 shows the average of energy consumption associated with the elementlevel of the HPC Architecture. The interconnection network switch that relies nodes below involves a 5% of the total power consumption in a best case. This 5% may increase to 10% or 15% depending of the network technology. Usually, fastest networks consume more energy related within accordance with the energy-proportional characteristic exposed before.

Figure 2 shows the average energy consumption in a Grid computing architecture. This figure shows the same average energy consumption for a Grid Computing architecture and a HPC architecture (Figure 1). At the top of Figure 2, we present a horizontal bar with the same color description for each power average consumption, adding a white value for very low (possible) values. The horizontal bar takes into account resources heterogeneity (each one of the resources can have specific performance, then specific energy proportional

636

mentioned earlier, the power efficiency can be considered as the ratio of performance to power consumption related with performance features given by architectural characteristics. We propose a model in three levels: process and essential hardware modeling, network and communication modeling and variation modeling for use of applications. In summary we consider that the scalable system modeled is composed of a set of 𝑀 of 𝑚 nodes (homogeneous or heterogeneous) that contains processor/memory elements interconnected by a high bandwidth network.

value), network features (devices among the Grid network) and different resources utilization (each site can perform different applications).

A. Process and Essential Hardware Modeling

Figure 2.

Processors have different processing speeds involving different processing performance in terms of MIPS. Nowadays, the processing capacity takes into account CPUs and/or GPUs. The amount of processing cores or unit-cores in the two types of processing units give a measured factor but the Million Instruction Per Second (MIPS), or the Float Points per Second (FLOPS) provide a measurement metric that describes performance. On the other hand, there are other essential hardware components affecting the energy-power consumption, such as RAM memory and hard disk devices. Also, energy-power consumption profiles are related with policies and usage manners with a strong influence of the processing [15].

Grid Architecture and Average of Energy Consumption

Application execution has an effect on power consumption, as any scalable architecture, involving the performance and power efficiency of the other levels. Therefore, the problem of efficient scheduling and execution of applications to optimize the energy consumption is important. A good example can be the execution of message passing applications. In message passing parallelization, the workload moves between nodes or sites generating specific loads on elements of the sites, mainly processors and memory. However, the network is affected by messages transfer and collective communications generated on the platform. Depending of the workload and the implemented algorithm, the workload can be efficient and consequently the power consumption is directly improved.

Based on the amount of MIPS or FLOPS, the power consumption varies in accordance with the effective workload. Depending on the type of parallelism, this workload can be distributed in CPUs or GPUs using different programming models. Nowadays, it is possible to exploit the fine granularity of problems to address application development and execute those problems in c GPUs systems on many-core architectures. NVIDIA⃝ Due to the specific architecture features that provide high granularity problem treatment, the processing performance may increase by a factor of 6𝑥 speedup, with almost a direct gap of 6 of power consumption between CPUs and GPUs1 according with the energy proportional principle.

In [20] an approach to optimize power consumption on Grid computing platforms using scheduling strategies was developed. The approach was based on a two-phase optimization technique: a best-effort scheduling algorithm is combined with a dynamic voltage scaling algorithm. The voltage scaling algorithm is applied to reduce the energy consumption by scaling down the processor voltages to a proper level (Dynamic Voltage Scaling (DVS) [3]), thus extending the execution time of jobs without degrading the performance computed by the best-effort scheduling algorithm. However, the results suggested a rigorous analysis to take into account infrastructure features, scalability on the number of tasks observing high workloads and the traffic influence.

Exploiting a first approach using DVS, we consider a basic model for a minimum energy voltage observing workload activity. Then, from [20] and [22], the Processing Energy Consumption (PEC) by MIPS is estimated as: 𝑃 𝐸𝐶 = 𝐸𝐴𝑐𝑡𝑖𝑣𝑒 + 𝐸𝐿𝑒𝑎𝑘 ∗ 𝑀 𝐼𝑃 𝑆

(1)

where 𝐸𝐴𝑐𝑡𝑖𝑣𝑒 is the energy consumed in an active state and 𝐸𝐿𝑒𝑎𝑘 is the energy losses during idle state. It is important to mention, that indirect energy consumptions (associated with fans, coolers) due to heat dissipation or external devices are not considered in Eq. (1). To consider a

At this point, the statement of the problem involves an adequate description of the main components to describe power efficiency in terms of architecture elements of the system. As

1 However, due to the heat dissipation related with the operation of each c the gap is 10x [17]. GPUs, in the series known as NVIDA Fermi⃝

637

relation of the specific device and the total facility power as a mechanism to estimate power usage efficiency, PUE, [8] proposed

Table I VARIATION U SE P OWER C ONSUMPTION Use Full Half Low - Idle

𝑇 𝑜𝑡𝑎𝑙𝐹 𝑎𝑐𝑖𝑙𝑖𝑡𝑦𝑃 𝑜𝑤𝑒𝑟 (2) 𝐷𝑒𝑣𝑖𝑐𝑒𝐸𝑙𝑒𝑚𝑒𝑛𝑡𝑃 𝑜𝑤𝑒𝑟 When PUE values remain between 2.0 and 3.0, the process is considered inefficient. Values between 1.2 and 1.7 are still efficient, considering 1.2 as a good value. 𝑃𝑈𝐸 ≈

Feature Collateral Use - High latency. Half Use latency or Specific All time. Low Status caused by networking behavior.

III. P RELIMINARY R ESULTS A first set of experiments was performed comparing the theoretical values of two specific platforms containing GPUs and CPUS and a specific test using comparative benchmark tool, to observe speed up and estimate power consumption. A second analysis scales the estimations on two nodes/sites of a high scalable and reconfigurable platform with specific characteristics.

B. Network and Communication In this paper, we only consider wire connections and high bandwidth networks. In the same way that for process modeling, a model for power consumption measured in network devices is propossed. Considering a data transfer rate capacity of 2.5 Gbps (or an extremely good case of 10 Gbps), the Networking Energy Consumption (NEC) is estimated as: 𝑁 𝑒𝑡 ∗ 𝐺𝑏𝑝𝑠 𝑁 𝐸𝐶 = 𝐸𝑇 𝑟𝑎𝑛𝑠𝑓 𝑒𝑟 + 𝐸𝐿𝑒𝑎𝑘

Value 1 0.5 0.0 - 0.5

The used benchmark corresponds to a replica of Largescale Atomic/Molecular Massively Parallel Simulator (LAMMPS) [18]. LAMPPS is a classical molecular dynamic code and package used generally to characterize GPUs supercomputer platforms. It is written to run well on parallel machines. The CUDA version of LAMMPS is accelerated by moving the force calculations to the GPU. The GPU LAMMPS code is MPI enabled and scales to a large GPU cluster. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed such that it can be easily modified or extended with new functionalities.

(3)

where 𝐸𝑇 𝑟𝑎𝑛𝑠𝑓 𝑒𝑟 is the energy consumed during the effective transference and 𝐸𝑁 𝑒𝑡−𝐿𝑒𝑎𝑘 is the energy loss by different situations, as for example, packet losing and standby network devices. Networking power efficiency is sensitive to the distance and devices to link the nodes [1], [15], [16]. The distance is referred in terms of high localized systems (as is the case of clusters) and non-localized ones, with devices located at far distances among them. Following the equation 2, the Power Usage Efficiency in Network, 𝑁𝑃 𝑈 𝐸 , can be estimated as:

Test platforms correspond to nodes in sites of Grid’50002 platform, the French wide computer infrastructure for research in high performance computing and distributed systems. At the same time, we used a local platform of the GridUIS-23 project.

𝑇 𝑜𝑡𝑎𝑙𝑃 𝑜𝑤𝑒𝑟𝑛𝑒𝑡𝑤𝑜𝑟𝑘𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 (4) 𝐷𝑒𝑣𝑖𝑐𝑒𝐸𝑙𝑒𝑚𝑒𝑛𝑡𝑃 𝑜𝑤𝑒𝑟 In this case, values can variate depending on the type of network. Obviously, if values are ≈ 1, a balance between the network power consumption and the power efficiency of the device is in place. However, these values can change based on usage variations on the network. 𝑁𝑃 𝑈 𝐸 =

Basically, in Grid’5000 we use the Adonis Cluster placed in Grenoble. The cluster is a Bullx R422-E2 machine with 12 nodes (12 nodes x 2 cpus per node = 24 cpus and 24 cpus x 4 core(s) per cpu = 96 cores) where each pair of Adonis nodes is linked an nvidia Tesla S1070 rack containing 4 Tesla T10 processors, thus each adonis node contains two 240 core/4Go GPUs. However, in our test we use in GPUs involved measures 1 Tesla T10 processor. We run the benchmark almost 5 times to obtain an average value. Fixed-size means that the same problem with 32,000 atoms was run on varying numbers of processors. Scaledsize means that when run on P processors, the number of atoms in the simulation was P times larger than the oneprocessor run. Thus a scaled-size 8-nodes or 32 cores/16 processors run is for 1,000,000 atoms approx.

C. Variation in Usage Variation in usage corresponds to the dynamic behavior of a high bandwidth center addressed to high performance scalable systems. Different scenarios are possible and they are described in Table I. Tables presented in this paper provide measures of total energy consumption, energy consumption during leak status associated with idle or stand-by process, a metric that corresponds to the quantity of processed flops per second and Gigaflops per second transferred among the nodes or cores, depending on the experience and the speedup shown in seconds. Approximate values in Table I are estimated observing the relation between the network/processing demand and usage and the energy-proportional consumption.

2 https://www.grid5000.fr 3 http://sc3.uis.edu.co

638

consumption values. In the case of GPU computing, it is mainly due to the technology features (and associated workload). However, the case of generic Grid experiment is undoubtedly due to growth in data transfer costs.

Using a specific platform with 8 CPUs with Intel Xeon E5520 with 24GB RAM and a 2.5. Gbps of Giga-ethernet network, we can estimate values of Power consumption using the expression presented above. Table II shows the estimated values. In the tables, 𝑃 𝑈 𝐶 corresponds to power user consumption and 𝑁 𝑃 𝑈 𝐸 is the network power user efficiency.

Variations on demand on the case of the Grid test with the used benchmark are not easy. We observe a dynamic range in the general usage of the network and we can confirm important variations in measures. However, they cannot be associated with our tests directly due to correspondences with external measurements. In this case, it is necessary to observe the dynamic range in accordance with the middleware to vary the data or to use an specific application that allows these measurements.

Table II HPC P LATFORM P OWER A NALYSIS Total PEC 1.1 N𝑃 𝑈 𝐸 0.5

Active

Leak

Metric

Speedup

0.9

0.2

0.8 GFlops

23 Secs

0.4

0.1

0.8 Gbps

0.07 Secs

V. C ONCLUSION AND P ERSPECTIVES

Table III shows the power consumption in a NVIDIA Tesla C2050 architecture with 448 cores using CUDA version.

We have presented an analysis for energy efficiency consumption on scalable architectures observing a basic principle of energy proportionality consumption. Our generic modeling allows to observe processing, traffic and demands as a holistic approach to performance evaluation of platform-application relation. Measurements are associated with specific workloads and we can take into account differences in architecture and technology.

Table III HPC P LATFORM P OWER A NALYSIS USING CUDA Total PEC 4.7 N𝑃 𝑈 𝐸 0.5

Active

Leak

Metric

Speedup

3.9

0.8

0.9 TFlops

6 Secs

0.4

0.1

0.8 Gbps

0.0003 Secs

This analysis approach on platform characteristics related to energy efficiency consumption on scalable architectures raises two main questions: In first place, how we can measure the energy efficiency consumption on scalable architectures independently of the topology of the network, taking in account differences in technology and adaptation to varying demands of resources?. Secondly, what type of model describes effectively power consumption and predicts the performance of applications related with the energyproportional principle? The approach presented in this paper lays out avenues for the development of these two questions, searching efficient mechanism to minimize the impact on energy consumption on applications for scalable architectures issues.

In Table IV, we present results for power consumption in a similar experiment on two remote nodes in a Grid Platform. We have only considered two CPU processors, MPI Instructions and sending the same quantity of jobs that for the HPC platform. Table IV G RID P LATFORM P OWER A NALYSIS Total PEC 1.1 N𝑃 𝑈 𝐸 1.5

Active

Leak

Metric

Speedup

0.9

0.2

0.8 GFlops

23 Secs

1.0

0.5

0.8 Gbps

15 Secs

A discussion of the results presented in tables II, III, IV follows.

ACKNOWLEDGMENT Experiments presented in this paper were carried out using the Grid’5000 experimental testbed, under development by the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). At same time, many experiments presented in this paper were carried out using the GridUIS-2 experimental testbed, under development at the Universidad Industrial de Santander (UIS) High Performance and Scientific Computing Service development action with support from UIS Vicerrector´ıa de Investigaci´on y Extensi´on (VIE-UIS) and several UIS research groups as well as other funding bodies (see https://sc3.uis.edu.co).

IV. D ISCUSSION Tables II, III and IV present interesting results for a first performance evaluation. In all cases our analysis is supported by the experimental results regarding the increase of power consumption due to technology features and energy-proportional principle. Experiment results indicate a trend about the leak status and the relation with idle status (components waiting for some input to continue processing). However, these results are not conclusive because the experiment can not characterize rigorously leak states. In the case of the GPU experiment and the Grid based experiment we can observe an increase in the power

639

R EFERENCES

[15] Le, K., Bianchini, R., Martonosi, M. and Nguyen, T. Costand Energy-Aware Load Distribution Across Data Centers. Proceedings of the Workshop on Power-Aware Computing and Systems (HotPower), October 2009.

[1] Abts, D., Kausler, P. and Liu, H. Energy Proportional Datacenter Networks. Proceedings of the International Symposium on Computer Architecture, 2010.

[16] Nedevschi, S., Chandrashekar, J., Liu, J., Nordman, B., Ratnasamy, S. and Taft, N. Skilled in the art of being idle: reducing energy waste in networked systems. Proceedings on 6th USENIX Symposium on Networked Systems Design and Implementation, 2009.

[2] Barroso, L.A. and H¨ozle, U. The Case of Energy-Proportional Computing. IEEE Computer Vol. 40, 2007. [3] Burd, T. D., Pering, T. A., Stratakos A. J. and Brodersen R. W. A Dynamic Voltage Scaled Microprocessor System. IEEE Journal of Solid-State Circuits. Vol. 35(11), 2000.

[17] NVIDIA Corporation http://www.nvidia.com

[4] Chabarek, J., Sommers, J., Barford, P., Estan, C., Tsiang, D. and Wright, S. Power Awareness in Network Design and Routing. Proceedings in IEEE-INFOCOM 2008. The 27th Conference on Computer Communications, 2008.

[18] NVIDIA Corporation http://www.nvidia.com/object/lammps on tesla.html NVIDIA LAMMPS Test. [19] NVIDIA Corporation. NVIDIA GPU Computing Systems Serie Tesla Technical Specifications. NVIDIA Corporation, 𝑆𝑃 − 04975 − 001 𝑣04 Document, San Jose, CA, USA, June 2010.

[5] Carissimi, A., Geyer, C., Maillard, N., Navaux, P., Cavalheiro, G., Pilla, M., Yamin, A., De Rose, C., Fernandes, L. Ferreto, T. and Zorzo, A. Energy-Aware Scheduling of Parallel Programs. Proceedings on Latin American Conference on High Performance Computing, 2010.

[20] Pecero S´anchez, J. E., Bouvry, P. and Barrios Hern´andez C.J. Low Energy and High Performance Scheduling on Scalable Computing Systems. Proceedings on Latin American Conference on High Performance Computing, 2010.

[6] Christensen, K., Gunaratne, C., Nordman, B. and George. A. The next frontier for communications networks: power management. Computer Communcations, 2004.

[21] Schulz, G. The Green and Virtual Data Center. CRC Press, Boca Raton, USA, 2009.

[7] Foster, I. and Kesselman, C. (Editors) The Grid 2: Blueprint for a Future Computing Infrastructure. Morgan Kaufmann Publishers - Elsevier Inc. New York, United States of America. 2004.

[22] Zhai, B., Blaauw, D., Sylvester, D. and Flautner, K. Theoretical and practical limits of dynamic voltage scaling. Proceedings of the ACM of the 41st annual Design Automation Conference, 2004.

[8] Greenberg, A., Hamilton, J., Maltz, D. and Patel P. The Cost of a Cloud: Research Problems in Data Center Networks. ACM SIGCOMM Computer Communication Review, Volume 39 Issue 1, ACM, New York, USA. January 2009. [9] Gruber, R. and Keller, V. HPC@GreenIT, Green High Perfomance Computing Methods. Springer-Verlag Berlin Heidelberg, Germany, 2010. [10] Heath, T., Diniz, B., Carrera, E. V., Meira Jr. W. and Bianchini, R. Energy Conservation in Heterogeneous Server Clusters. Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), June 2005. [11] Heller, O., Seetharaman, S., Mahadevan, P., Yiakoumis, Y., Sharma, P. Banerjee, S. and MacKeown, N. ElasticTree: Saving Energy in data Center Networks. Proceedings on 7th USENIX Symposium on Networked Systems Design and Implementation, 2010. [12] International Energy Agency, IEA. http://www.iea.org [13] International Energy Agency, IEA. Key World Energy Statistics 2010. International Energy Agency Report. Paris, France, October 2010. [14] Kliazovich, D., Bouvry, P. and Khan, S.U. DENS: Data Center Energy-Efficient Network-Aware Scheduling. Proceedings on ACM/IEEE International Conference on Green Computing and Communications (GreenCom), Hangzhou, China, December 2010.

640

Lihat lebih banyak...

Energy Efficiency on Scalable Computing Architectures

Descripción

Comentarios