Projectdetail

Bilaterale Ausschreibung: Indien

Disciplines

Computer Sciences (100%)

Keywords

Parallelizing Compiler, Energy Efficient Programming, Autotunig, Performance Analysis

Abstract

Final report

Scientific applications require an ever larger amount of computing and storage resources to solve large-scale simulations of increasing complexity. However, in recent years, energy conscious design of HPC applications has motivated the minds of application developers. HPC researchers, application developers, and architecture designers got interested in Green Top 500 lists of supercomputers instead of traditional Top 500 lists of supercomputers. The sole reasons are due to increased electricity billing and CO2 emissions. According to a report submitted to US congress on Server and Data Centre Energy Efficiency in 2007, the energy consumption of US data centers was 61 billion kilowatts-hour in 2006 totaling USD 4.5 billion. It is predicted that the energy billing could increase in forthcoming years if precautions are not practiced in all levels including operating system, kernel, and application. It is a known fact that the majority of HPC applications has poor energy efficiency, for instance, due to hefty wait times caused by pipelines or caches or due to message passing inter-task load imbalances. Although application developers are aware of the need to reduce electricity bills and carbon emissions in environment, they find it difficult in pointing out the exact code regions which lead to intolerable energy consumption. In fact, obtaining a clear picture on energy consumption of code regions of scientific applications is a challenge. This is due to inaccuracy of energy measurements using existing hardware and software solutions. The accuracy of measurements fails when fine granular code regions are considered the sampling frequency of RAPL counters is comparatively less. Furthermore, there is the need for tools and techniques to optimize the amount of energy required to solve various scientific problems while minimizing the impact on execution time. Numerous models have been built based on hardware counter data, time-intervals, dynamic programming, and machine learning. All of these approaches explore concurrency throttling (e.g. changing the number of threads per code region) and/or DVFS (dynamic voltage and frequency scaling) to dilate computation into slack (any non-overlapped hardware or algorithmic latency) or to find effective clock frequency settings for code regions. They are not exploring code changes, iterative compilation and auto-tuning which can be applied by modern compiler technologies and HPC systems to impact both execution time and energy consumption and to widen the search space for obtaining efficient execution time and energy trade-offs. EASE (Energy Aware Auto-Tuning for Scientific Applications) will introduce a novel approach that combines performance prediction and analysis with compiler and online technologies to support multi-objective auto-tuning for hybrid programming models that use both message passing and shared memory. EASE will be demonstrated for three objectives comprising execution time, energy, and efficiency.

Scientific applications require more and more computing and storage capacities to solve large and complex simulations. It is known that the majority of applications for high performance computers has poor energy efficiency, e.g. because of waiting times in pipelines or caches or because of unequal load distribution due to message exchange. Although application developers are aware of reducing power and CO2 emissions, it is difficult to find the code regions that lead to unacceptable energy demands. In addition, there is a need for tools and techniques to optimize energy requirements to solve various scientific problems while minimizing impact on runtime. The EASE (Energy-aware Autotuning for Scientific Applications) project has developed a novel approach to support multi-parameter auto-tuning for hybrid programming models for shared and distributed memory computers. EASE has been evaluated for three optimization goals: run time, energy and efficiency. For this purpose, a compiler has been developed for C++ programs that allows modifying the parallelism (e.g. changing the threads per region) and/or DVFS (dynamically changing volt- and clock rates) to perform computations in latencies (missing overlap of HW and algorithmic Latency) or to find efficient clock rates per region. Experiments for different programs on two parallel computers have led to a performance increase of up to a factor of 10. In addition, runtime and energy estimation technologies have also been developed, leading to a better understanding of the energy and runtime behaviour of parallel programs, thus providing targeted analysis and control of optimization. Experiments with different parallel codes have achieved energy and runtime prediction accuracy of up to 86% and 94%, respectively.

Research institution(s)

Universität Innsbruck - 100%

International project participants

Bernd Mohr, Forschungszentrum Jülich - Germany
Michael Gerndt, Technische Universität München - Germany
Shajulin Benedict, St. Xavier´s Catholic College of Engineering - India
Laura N Carrington, San Diego Supercomputer Center - USA
Kirk W Cameron, Virginia Polytechnic Institute and State University - USA

Research Output

171 Citations
9 Publications

Publications

Title	Modelling energy consumption of network transfers and virtual machine migration
DOI	10.1016/j.future.2015.07.007
Type	Journal Article
Author	De Maio V
Journal	Future Generation Computer Systems
Pages	388-406
Link	Publication

Title	A workflow runtime environment for manycore parallel architectures
DOI	10.1145/2822332.2822333
Type	Conference Proceeding Abstract
Author	Janetschek M
Pages	1-12
Link	Publication

Title	A workflow runtime environment for manycore parallel architectures
DOI	10.1016/j.future.2017.02.029
Type	Journal Article
Author	Janetschek M
Journal	Future Generation Computer Systems
Pages	330-347
Link	Publication

Title	From Single- to Multi-Objective Auto-Tuning of Programs: Advantages and Implications
DOI	10.1155/2014/818579
Type	Journal Article
Author	Durillo J
Journal	Scientific Programming
Pages	285-297
Link	Publication

Title	Predicting Workflow Task Execution Time in the Cloud Using A Two-Stage Machine Learning Approach
DOI	10.1109/tcc.2017.2732344
Type	Journal Article
Author	Pham T
Journal	IEEE Transactions on Cloud Computing
Pages	256-268
Link	Publication

Title	Task-parallel Runtime System Optimization Using Static Compiler Analysis
DOI	10.1145/3075564.3075574
Type	Conference Proceeding Abstract
Author	Thoman P
Pages	201-210
Link	Publication

Title	Characterizing Performance and Cache Impacts of Code Multi-Versioning on Multicore Architectures
DOI	10.1109/pdp.2017.77
Type	Conference Proceeding Abstract
Author	Zangerl P
Pages	209-213
Link	Publication

Title	A Region-Aware Multi-Objective Auto-Tuner for Parallel Programs
DOI	10.1109/icppw.2017.37
Type	Conference Proceeding Abstract
Author	Kofler K
Pages	190-199

Title	Energy Prediction of OpenMP Applications Using Random Forest Modeling Approach
DOI	10.1109/ipdpsw.2015.12
Type	Conference Proceeding Abstract
Author	Benedict S
Pages	1251-1260

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

EASE: Energy-Aware Autotuning for Scientific Applications

EASE: Energy-Aware Autotuning for Scientific Applications

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

EASE: Energy-Aware Autotuning for Scientific Applications

EASE: Energy-Aware Autotuning for Scientific Applications

Disciplines

Keywords

Research Output