EASE: Energy-Aware Autotuning for Scientific Applications
EASE: Energy-Aware Autotuning for Scientific Applications
Bilaterale Ausschreibung: Indien
Disciplines
Computer Sciences (100%)
Keywords
-
Parallelizing Compiler,
Energy Efficient Programming,
Autotunig,
Performance Analysis
Scientific applications require an ever larger amount of computing and storage resources to solve large-scale simulations of increasing complexity. However, in recent years, energy conscious design of HPC applications has motivated the minds of application developers. HPC researchers, application developers, and architecture designers got interested in Green Top 500 lists of supercomputers instead of traditional Top 500 lists of supercomputers. The sole reasons are due to increased electricity billing and CO2 emissions. According to a report submitted to US congress on Server and Data Centre Energy Efficiency in 2007, the energy consumption of US data centers was 61 billion kilowatts-hour in 2006 totaling USD 4.5 billion. It is predicted that the energy billing could increase in forthcoming years if precautions are not practiced in all levels including operating system, kernel, and application. It is a known fact that the majority of HPC applications has poor energy efficiency, for instance, due to hefty wait times caused by pipelines or caches or due to message passing inter-task load imbalances. Although application developers are aware of the need to reduce electricity bills and carbon emissions in environment, they find it difficult in pointing out the exact code regions which lead to intolerable energy consumption. In fact, obtaining a clear picture on energy consumption of code regions of scientific applications is a challenge. This is due to inaccuracy of energy measurements using existing hardware and software solutions. The accuracy of measurements fails when fine granular code regions are considered the sampling frequency of RAPL counters is comparatively less. Furthermore, there is the need for tools and techniques to optimize the amount of energy required to solve various scientific problems while minimizing the impact on execution time. Numerous models have been built based on hardware counter data, time-intervals, dynamic programming, and machine learning. All of these approaches explore concurrency throttling (e.g. changing the number of threads per code region) and/or DVFS (dynamic voltage and frequency scaling) to dilate computation into slack (any non-overlapped hardware or algorithmic latency) or to find effective clock frequency settings for code regions. They are not exploring code changes, iterative compilation and auto-tuning which can be applied by modern compiler technologies and HPC systems to impact both execution time and energy consumption and to widen the search space for obtaining efficient execution time and energy trade-offs. EASE (Energy Aware Auto-Tuning for Scientific Applications) will introduce a novel approach that combines performance prediction and analysis with compiler and online technologies to support multi-objective auto-tuning for hybrid programming models that use both message passing and shared memory. EASE will be demonstrated for three objectives comprising execution time, energy, and efficiency.
Scientific applications require more and more computing and storage capacities to solve large and complex simulations. It is known that the majority of applications for high performance computers has poor energy efficiency, e.g. because of waiting times in pipelines or caches or because of unequal load distribution due to message exchange. Although application developers are aware of reducing power and CO2 emissions, it is difficult to find the code regions that lead to unacceptable energy demands. In addition, there is a need for tools and techniques to optimize energy requirements to solve various scientific problems while minimizing impact on runtime. The EASE (Energy-aware Autotuning for Scientific Applications) project has developed a novel approach to support multi-parameter auto-tuning for hybrid programming models for shared and distributed memory computers. EASE has been evaluated for three optimization goals: run time, energy and efficiency. For this purpose, a compiler has been developed for C++ programs that allows modifying the parallelism (e.g. changing the threads per region) and/or DVFS (dynamically changing volt- and clock rates) to perform computations in latencies (missing overlap of HW and algorithmic Latency) or to find efficient clock rates per region. Experiments for different programs on two parallel computers have led to a performance increase of up to a factor of 10. In addition, runtime and energy estimation technologies have also been developed, leading to a better understanding of the energy and runtime behaviour of parallel programs, thus providing targeted analysis and control of optimization. Experiments with different parallel codes have achieved energy and runtime prediction accuracy of up to 86% and 94%, respectively.
- Universität Innsbruck - 100%
- Bernd Mohr, Forschungszentrum Jülich - Germany
- Michael Gerndt, Technische Universität München - Germany
- Shajulin Benedict, St. Xavier´s Catholic College of Engineering - India
- Laura N Carrington, San Diego Supercomputer Center - USA
- Kirk W Cameron, Virginia Polytechnic Institute and State University - USA
Research Output
- 171 Citations
- 9 Publications
-
2016
Title Modelling energy consumption of network transfers and virtual machine migration DOI 10.1016/j.future.2015.07.007 Type Journal Article Author De Maio V Journal Future Generation Computer Systems Pages 388-406 Link Publication -
2015
Title A workflow runtime environment for manycore parallel architectures DOI 10.1145/2822332.2822333 Type Conference Proceeding Abstract Author Janetschek M Pages 1-12 Link Publication -
2017
Title A workflow runtime environment for manycore parallel architectures DOI 10.1016/j.future.2017.02.029 Type Journal Article Author Janetschek M Journal Future Generation Computer Systems Pages 330-347 Link Publication -
2014
Title From Single- to Multi-Objective Auto-Tuning of Programs: Advantages and Implications DOI 10.1155/2014/818579 Type Journal Article Author Durillo J Journal Scientific Programming Pages 285-297 Link Publication -
2017
Title Predicting Workflow Task Execution Time in the Cloud Using A Two-Stage Machine Learning Approach DOI 10.1109/tcc.2017.2732344 Type Journal Article Author Pham T Journal IEEE Transactions on Cloud Computing Pages 256-268 Link Publication -
2017
Title Task-parallel Runtime System Optimization Using Static Compiler Analysis DOI 10.1145/3075564.3075574 Type Conference Proceeding Abstract Author Thoman P Pages 201-210 Link Publication -
2017
Title Characterizing Performance and Cache Impacts of Code Multi-Versioning on Multicore Architectures DOI 10.1109/pdp.2017.77 Type Conference Proceeding Abstract Author Zangerl P Pages 209-213 Link Publication -
2017
Title A Region-Aware Multi-Objective Auto-Tuner for Parallel Programs DOI 10.1109/icppw.2017.37 Type Conference Proceeding Abstract Author Kofler K Pages 190-199 -
2015
Title Energy Prediction of OpenMP Applications Using Random Forest Modeling Approach DOI 10.1109/ipdpsw.2015.12 Type Conference Proceeding Abstract Author Benedict S Pages 1251-1260