Energy efficiency in large-scale distributed computing systems

June 14, 2017 | Autor: Matjaz Depolli | Categoría: Energy efficiency, Clusters, Computers, Distributed Processing, Hardware, Energy Efficiency
Share Embed


Descripción

Energy Efficiency in Large-Scale Distributed Computing Systems R. Trobec*, M. Depolli*, K. Skala** and T. Lipić** *Jožef

Stefan Institute/Department of Communication Systems, Ljubljana, Slovenia Bošković Institute/Centre for Informatics and Computing, Zagreb, Croatia {roman.trobec, matjaz.depolli}@ijs.si {skala, tomislav.lipic }@irb.hr

**Ruđer

Abstract – The ever-increasing energy consumption in large-scale distributed computing systems such as clusters, grids and clouds raises social, technical, economical, and environmental concerns. Therefore, designing novel energyefficient approaches, to reduce energy consumption, at all levels of distributed system architecture is of great importance for the whole society. However, the essential step towards the introduction of energy-efficiency in largescale distributed systems is to measure the power consumption, accurately, reliably, and continually in each component of the system. This paper briefly surveys the current approaches for measuring and profiling power consumption in large scale distributed systems. Furthermore, the practical case study of a real-time power measurement in multi-core computing system, as a basic building block of a distributed computing system, is presented.

I.

INTRODUCTION

Large-scale distributed computing systems are composed of a large number of computing and storage resources connected through a network. Large-scale distributing computing systems include clusters, grids and clouds. Cluster computing systems can be classified as high availability clusters and load balancing clusters. In high availability clusters a small number of redundant nodes is used in order to eliminate single points of failures, while in load balancing clusters a larger number of nodes is used with focus on providing an environment for High Performance Computing (HPC). Cluster computing system is embedded both in grid computing systems and cloud computing systems [1]. However, compared to cluster computing systems, grid computing systems are more loosely coupled, heterogeneous and geographically dispersed. Cloud computing systems, using the virtualization technology, can provide scalable, reliable and cost-efficient virtual clusters [2][3]. The analysis of their applicability for HPC applications is one of interesting novel research topics [4][5][6]. As such large-scale distributed computing systems grow in size, adding more and more computing nodes and storage resources, their energy consumption is exponentially increasing [7]. In computer systems, energy (in joule), is the electricity resource that can power the hardware components to do computation and the dissipate rate of energy is power (in watts, joules per second). The

ever-increasing energy consumption of large-scale distributed computing systems causes higher operation costs (e.g. electricity bills) and has negative environmental impacts (e.g. carbon dioxide emissions). Therefore, while designing such systems the focus is shifted from performance improvements to the energy efficiency. Energy can be reduced at different levels of distributed architecture: individually on each node or hardware component-level, at the middleware level, at the networking level and at the application level [8][9][10]. A processor is component that consumes dominant amount of energy on each node. Thus, most of techniques are designed to reduce energy consumption of the processor [11][12], but also energy consumption reduction techniques have been proposed for other devices such as the phase-change memory and solid-state disk driver. Suboptimal use of systems resources (e.g. overprovisioning) is the primary source of energy inefficiency at middleware layer [8][9][13]. In over-provisioned system components usually consume excessive power when they are relatively idle or underutilized. Energyproportional computing [14] introduces the concept of consuming energy proportional to resource utilization so that an idle or underutilized component consumes less energy. For improving energy efficiency at middleware layer, energy-aware resource and workload management algorithms are utilized [13]. Increased network traffic in many bandwidth sensitive network applications results also in increased number of networking devices such as switches and routers and therefore in increased energy consumption. Thus different approaches for new energy aware network protocols and network infrastructure are developed for improving energy efficiency at network layer [15][16]. The energy consumption heavily depends on the characteristic of application running. Some applications are computationally intensive, other are data intensive, while other are hybrid of both. Thus energyaware programing models, regarding various types of workloads and architectures, are needed in order to develop energy efficient applications [17][18]. All this shows that energy can be saved at all levels of large-scale distributed system architecture. However, the essential step towards the introduction of energyefficiency in large-scale distributed systems is accurate, reliable and continuous energy consumption measurement

of each component in the system. In this paper, a brief survey of the current approaches for measuring and profiling power consumption in large scale distributed systems is outlined. Furthermore, the case study of a realtime power measurement in a cluster computing system is presented to demonstrate a practical implementation of described approaches. II.

ENERGY MEASUREMENT AND PROFILING APPROACHES

Main energy measuring and profiling approaches can be classified as hardware based approaches, software based approaches and hybrid of both. In following subsections the basic characteristics of hardware-based and software-based approaches are briefly outlined based on two recent extensive survey papers [19][20]. A. Hardware-based Energy Measurement Hardware-based energy measurement approaches are based on different instruments that measure the current voltage or the energy of different computation, communication or storage segments such as CPU racks of data centers, servers or motherboards to compute the energy consumption of the measured segment. These instruments can be implemented as meters, special hardware devices usually embedded in hardware platforms or as power sensors put on the hardware. An energy measurement with meters is a direct and a straight-forward approach. A digital multimeter is one example of a meter that is commonly used for such an energy measurement. Digital multimeters can collect the voltage or the energy samples of the measured segment in pre-specified time intervals and send the measurements through a data link connected to the data collection system. Another meter used for the energy measurement is a clamp meter that, compared to a digital multimeter, has a larger measurement range and can be used to measure the energy of segments with much higher currents. The previously mentioned meters are mainly used to measure the DC power since they are connected between the power supply and the measured segment. On the other hand, there are other meters (e.g. “Watts UP” [21]) that can be also used to measure the AC power. However such meters only measure the system level power, because only power supplies are powered by AC. In their simplicity, meters cannot be used when a higher control of the measurement process is required. In those cases more complex devices must be used - the specially designed ones. In 2000 Viredaz et al. designed a platform called Itsy to measure the energy consumption of mobile devices [22]. Another example of a complex device used to measure the energy is a single board on computer called PLEB [23]. PLEB is design with a set of current sensors on-board while the micro-controller of this device is integrated with an analogue-to-digital converter to read the sensors. Unlike Itsy, which measures only the energy consumption of mobile devices, PLEB can be used to measure the energy of processors, memories, flash drivers and I/O devices. Finally, approach of sensors integration into hardware is mainly used by high-performance servers. Sensor can

be either in-built or external. On-chip energy or temperature sensors provide high accurate solutions for monitoring energy although currently these solutions are still too expensive. Intel has provided Running Average Power Limit (RAPL) energy sensors in recent architectures such as SandyBridge for obtaining the energy consumption of CPU [24]. On the other hand, external sensors that output energy, temperature, humidity or water hazards of servers, are less expensive and also less accurate. For instance, in [25] authors used thermal camera as external device for the remote temperature monitoring of a computer cluster at the blade enclosurelevel. B. Software-based Energy Measurement The disadvantages of hardware-based energy measurement techniques are seen in the fact that they depend on expensive hardware sensors and considerable knowledge of hardware design. Considering the need of expensive and cumbersome setup, they are also not well suited for online monitoring and energy-aware algorithms (e.g. energy-aware scheduling algorithms). In contrary, software based approaches can be used to supply more fine-grained online energy information, with reduced accuracy, useful for augmenting energy-aware algorithms. Furthermore, compared to hardware-based approaches, software-based approaches are more flexible because they can be applied to different platforms without changing the hardware. A software-based approach usually builds power/ energy models to estimate the energy consumption at instruction level, program block level, process level, hardware component level or full system level. These models are built based on selection and optimization of power/energy indicators/descriptors of a software or a hardware component of interests. Regarding of power/energy indicators type, software based approaches can be classified as system profile-based approaches or hardware performance counter (PMC) based approaches. System profile or system events are a set of performance statistical information supplied by the operating system or special software. These events can describe the current state of hardware, software and operating system. Joulemeter [26] tracks the energy consumption of a virtual machine using power/energy models for three system hardware components (i.e. CPU, memory and disk). The CPU energy model uses CPU utilization, the memory energy model uses the number of last level cache misses, while the disk energy model uses the bytes of data written to and read from disk. Hardware performance counters are a group of special registers that store the counts of hardware-related activities within computer systems related to system hardware components. In [27] PMC-based modeling approaches are separated in two groups: top-down approaches and bottom-up approaches. The top-down approaches do not depend on modeled architecture thus enabling a fast and an easy deployment [28]. On the other hand, the bottom-up approaches depend on underlying architecture and produce more informative, responsive and accurate power/energy models than the top-down

approaches. They enable a breakdown of the energy consumption per components of the architecture but are more complex to deploy than the top-down approaches [29]. III.

USE CASE: REAL TIME POWER MEASUREMENT OF CLUSTER COMPUTING SYSTEM

The use case demonstrates a real time power monitoring of a muti-core computer, which is a main building block of computing clusters. A. Benchmark program We use a real benchmark application written in C++ that implements the parallel maximum clique algorithm [30], which is essentially an advanced recursive program for searching a tree. The algorithm takes an input graph and returns its maximum fully connected sub-graph. The parallelization of the algorithm employs the multithreading techniques, which are supported in most of the programming languages without extra libraries or other kind of software support. This makes the algorithm portable to other languages and operating systems. It can run on most modern multi-core computer architectures. All treads are asynchronously searching their branches in parallel using the global variable about the size of the current clique. The algorithm produces a significant speedup, which is near-linear for some instances of input graphs. We can trade with the number of cores and various system clock frequencies to achieve the minimal energy consumption.

individual core, the power consumption is limited to individual processors and to the whole main memory. Detailed measurements of the power consumption are thus possible on application level, provided the analyzed application does not share processors with other applications. PCM library requires installation by the super-user and also requires super-user privileges for measuring power consumption. Besides the option of linking the library with the target application, a command-line tool “pcmpower.x” is provided, which enables the power consumption measurements of custom applications. We use this tool for all of our measurements. D. Experimental Results We ran the parallel maximum clique algorithm on a different number of cores and measured the execution time and the power consumption. The results are shown in Figure 1 and Figure 2, respectively. As expected, the execution time in Figure 1 drops approximately proportional to the number of cores and also proportional to the system clock frequency. The power consumption on Figure 2 rises with the number of used cores and with the system clock frequency.

B. Experimental Computing System We evaluated the performance of the parallel maximum clique algorithm in terms of execution times and corresponding power consumption. All results are shown as the average from 15 experiments. The experimental computing system was built of a dual CPU 2.30 GHz Intel Xeon E5-2630, each of CPUs with six physical cores and runs under server version of Ubuntu 12.04. Figure 1. Execution time as a function of the number of cores

C. Real Time Power Monitoring Latest architectural designs of processors for desktop computers and workstation provide Performance Monitoring Units (PMUs), a hardware support for counting micro-architectural events of a processor core that are otherwise not explicitly visible to the outside world. In addition, latest versions of PMUs can collect data form the parts of the processor that are not parts of the core, e.g., a memory controller, and can even measure the power consumption of the processor and the main dynamic memory. The CPU performance measurements of our use case are accessed with the Intel Performance Counter Monitor (PCM) [31], which is a low level library that enables access to the PMU of the Intel processors with Nehalem micro-architecture or newer. The PMU statistics can be measured at any point of the application program and/or during the whole execution. Although some statistics, such as cache hit rates, can be measured on the level of

Figure 2. Power consumption as a function of the number of cores.

From Figure 2 we also see that the power consumption of an unloaded system is significant, approximately 75 % and 50 % of the power consumption of a fully loaded system under 1.2 GHz and 2.3 GHz, respectively. The idle power consumption is partially due to the constant refreshing of the main dynamic memory (DRAM). The PCM measurements show that the idle consumption of DRAM is 2.1 W and is the same for 1.2 GHz, and 2.3 GHz, while it reaches 7.5 to 10 W when used by at least one core. With no active core the power consumption of the experimental computing system is 7 W, but it increases to 40 W, if at least a single core is active. We assume that this is partially because of the processor logic that is always active (e.g. memory controller and cache), and partially due to the fact that computing cores are not shut down when running in the idle state but rather perform some simple task, such as an endless loop. This prevents a more efficient optimization of the power consumption in real applications. The energy consumed by our benchmark application equals the execution time multiplied by the power consumption and is plotted in Figure 3. We can see that it decreases with a similar rate as the execution time. Therefore, the most optimal solution, regarding the energy and the execution time, is on all available cores and with the highest possible frequency.

This unexpected result can be explained in the following way. The idle power consumption is significant, implying that the whole application should be executed in the shortest possible time. This can be achieved by all cores at the highest possible system clock frequency, because the speedup of the benchmark application is nearly ideal. In our use case, the contribution to a smaller energy consumption is much greater with a shorter execution time than with a smaller number of computing cores. Each additional core used by the application increases the power consumption by about 3% and decreases the execution time by about 50%. Therefore, the decrease in execution time provided by a core has much higher impact on the overall energy consumption as the power consumption of the fully loaded core. However, if the efficiency of the tested parallel algorithm is not so high, in other words, if the speedup is not ideal, then the situation could change significantly. To optimize the energy consumption more aggressively, we should be able to shout down the computing cores, or at least put them in a sleep state with a significantly lower idle power consumption. To apply the optimization of the energy consumption on general applications, a profiling of their speedup is needed before the final run. These are the topics of our current research and future work. ACKNOWLEDGMENT The authors acknowledge the support of the bi-lateral scientific research project “Optimization of energy consumption in distributed computing systems”, founded by the Ministry of Education, Science, Culture and Sport of the Republic of Slovenia and the Ministry of Science, Education and Sports of the Republic of Croatia. The project is in the framework of the joint Slovenian-Croatian cooperation in science and technology between Jožef Stefan Institute (JSI) and Ruđer Bošković Institute (RBI). REFERENCES [1]

[2] [3] [4] Figure 3. Energy consumption as a function of number of cores. [5]

IV.

CONLUSION

This paper provides a brief overview on different approaches for measuring and profiling power consumption in large scale distributed systems. A simple practical use case study of a real-time power measurement in a multi-core computing system is presented. The energy consumption of a parallel custom algorithm is measured, by the PCM library, as a function of the number of cores and system clock frequency. The obtained measurements indicate that the most optimal solution, regarding the energy consumption, is to run the parallel algorithm on all available cores and with the highest possible frequency.

[6]

[7]

[8]

G. L. Valentini, W. Lassonde, S. U. Khan, N. Min-Allah, S. A. Madani, J. Li, L. Zhang, L. Wang, N. Ghani, J. Kolodziej, and et al., “An overview of energy efficiency techniques in cluster computing systems,” Cluster Computing, 2011, pp. 1-13. “CloudMan”, http://usecloudman.org/. “StarCluster.”, http://web.mit.edu/stardev/cluster/. C. Vecchiola, S. Pandey, and R. Buyya, “High-performance cloud computing: a view of scientific applications,” 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN 2009), 14-16 Dec. 2009, Piscataway, NJ, USA, 2009, pp. 4–16. S. P. Ahuja, and S. Mani, "The State of High Performance Computing in the Cloud," Journal of Emerging Trends in Computing and Information Sciences, vol. 3, February 2012, pp. 262 – 266. P. Mehrotra, J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff, S. Saini, and R. Biswas, “Performance evaluation of Amazon EC2 for NASA HPC applications,” Proceedings of the 3rd workshop on Scientific Cloud Computing Date, New York, USA, 2012, pp. 41–50. A.-C. Orgerie, L. Lefevre, J.-P. Gelas, “Demystifying energy consumption in Grids and Clouds," International Green Computing Conference, 15-18 Aug. 2010, pp. 335-342 A. Beloglazov, R. Buyya, Y. C. Lee, and A. Zomaya, “A taxonomy and survey of energy-efficient data centers and cloud computing systems,” Advances in Computers vol. 82, no. 2, 2011, pp. 47-111.

[9]

[10]

[11]

[12]

[13] [14] [15]

[16]

[17]

[18]

[19]

[20]

C. Cai, L. Wang, S. U. Khan, and J. Tao. “Energy-Aware High Performance Computing: A Taxonomy Study,” Conference on IEEE 17th International Parallel and Distributed Systems (ICPADS), 2011, pp. 953-958. J. Shuja, S. A. Madani, K. Bilal, K. Hayat, S. U. Khan, and S. Sarwar, “Energy-efficient data centers,” Computing, vol. 94, no. 12, 2012, pp. 973-994. Y. Liu, and Z. Hong, “A survey of the research on power management techniques for high performance system,.” Software: Practice and Experience vol. 40, no. 11 , 2010, pp. 943-964. N. B. Rizvandi, and A. Y. Zomaya, “A Primarily Survey on Energy Efficiency in Cloud and Distributed Computing Systems,” arXiv preprint arXiv:1210.4690, 2012. A. Y. Zomaya, and C. L. Young, “Energy-efficient distributed computing,” Awareness Magazine, January 2012. L. A. Barroso and U. Hölzle. “The Case for Energy-Proportional Computing,” Computer , 2007, pp. 33–37. K. Bilal, Kashif, S. U. Khan, Sajjad A. Madani, K. Hayat, M. I. Khan, N. Min-Allah, J. Kolodziej, L. Wang, S. Zeadally, and D. Chen. “A survey on Green communications using Adaptive Link Rate,” Cluster Computing, 2012, pp. 1-15. A. P.Bianzino, C. Chaudet, D. Rossi, and J-L. Rougier. “A survey of green networking research,” Communications Surveys & Tutorials, IEEE vol. 14, no. 1, 2012, pp. 3-20. C. Zhang, K. Huang, X. Cui, Y. Chen, “Power-aware Programming with GPU Accelerators,” IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 21-25 May 2012, pp. 2443-2449. C. Lively, X. Wu, V. Taylor, S. Moore, H.-C. Chang, C.-Yi Su, and K. Cameron, “Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems,” Computer Science-Research and Development, vol. 27, no. 4, 2012, pp. 245-253. S. Benedict, “Energy-aware performance analysis methodologies for HPC architectures—An exploratory study”, Journal of Network and Computer Applications, vol. 35, issue 6, November 2012. H. Chen, and S. Weisong, “Power Measuring and Profiling: Stateof-the-Art”.

[21] Watts up, https://www.wattsupmeters.com. [22] M. A. Viredaz, D. A. Wallach, and D. A. Wallach, “Power evaluation of itsy version 2.3”, Technical report, Compaq, Western Research Laboratory, 2000. [23] D. C. Snowdon, S. M. Petters, and G. Heiser, “Power measurement as the basis for power management,” In Operat. System Platforms for embedded systemsIn, 2005. [24] Intel In-built Sensors. Running average power limit for xeon processors, http://www.intel.com/xeon [25] D. Kolarić, T. Lipić, I. Grubišić, L. Gjenero, K. Skala, “Application of Infrared Thermal Imaging in Blade System Temperature Monitoring,” in Proceedings of the 53rd International Symposium on Electronics in Marine, ELMAR, Zadar, 2011, pp. 309-312 [26] A. Kansal, F. Zhao, J. Liu, N. Kothari, and A. A. Bhattacharya, “Virtual machine power metering and provisioning,” In Proceedings of the 1st ACM symposium on Cloud computing, pp. 39-50, ACM, 2010. [27] R. Bertran, M. Gonzàlez, X. Martorell., N. Navarro, and E. Ayguadé, “Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up,” The Computer Journal, vol. 56 no. 2, pp. 198213. [28] W. L. Bircher, and L. K. John, “Complete system power estimation using processor performance events,” IEEE Transactions on Computers, vol. 61, no. 4, 2012, pp. 563-577. [29] R. Bertran, M. Gonzalez Tallada, X. Martorell, N. Navarro, E. Ayguade, "A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs," IEEE Transactions on Computers”, vol. PP, no. 99, pp. 1-1 4. [30] J. Konc, D. Janezic, “An improved branch and bound algorithm for the maximum clique problem,” MATCH Commun. Math. Comput. Chem, 2007, no. 58, pp. 569-590. [31] R. Dementiev, T. Willhalm, O. Bruggeman, P. Fay, P. Ungerer, A. Ott, P. Lu, J. Harris, and P. Kerly, “Intel R performance counter monitor – a better way to measure cpu utilization,” 2012, http://software.intel.com/en-us/articles/intel-performance-countermonitor.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.