Projectdetail

Disciplines

Computer Sciences (100%)

Keywords

Compiler, Parallel, Heterogenous, Machine Learning, Modelling, Optimisation

Abstract

Final report

The efficient mapping of program parallelism to multi-core, heterogeneous processors is extremely challenging and highly dependent on the underlying architecture. The overall objective of this project is to investigate a novel parallel compiler approach that can automatically learn how to best map program parallelism to multi-core, heterogeneous platforms. Rather than hard-coding a compiler strategy for each parallel platform, we aim to explore an innovative, portable, parallel compiler approach that can automatically self-adapt to any heterogeneous hardware and can improve its performance over time. This is achieved by employing machine learning approaches which first learn the optimisation space off-line and then automatically derive a strategy that attempts to generate the "best" mapping for any user program. This predictive modelling approach can be further extended to on-line adaptation to manage contention for resources. This project is aimed at exploring multi-core system software that is "future scalable" and if successful, will have a wide range of applications from high performance computing, to desktops, to embedded mobile devices. It will allow portable performance of parallel programs across platforms. Given the increasing prevalence of heterogeneous multi-cores such an approach, if successful, will have international academic impact and be of significant benefit to EU industry. Multi-core based processors are common-place and are widely seen as the most viable means of delivering performance with increasing transistor densities. Concurrent with this rise in multi-cores has been the adoption of specialised or customised cores such as general-purpose graphics processing units (GP-GPU). Unfortunately, there is no clear way for application software to harness this performance. Current compiler technology, whose role is to map software to the underlying hardware is simply incapable of doing this, requiring significant manual intervention from the programmer, tuning for each system. As the number and diversity of cores increases, this software gap grows larger and will be the fundamental critical issue for computing systems in 5 years time.

The efficient parallelization and optimization of computer programs on modern heterogeneous parallel computers is extremely challenging and highly dependent on the underlying architecture. The overall objective of this project was to investigate a novel parallel compiler approach that can automatically learn how to best map program parallelism to modern parallel computers with GPUs (graphics processing units) that are widely available in consumer, industry and scientific computers. Rather than hard-coding a compiler strategy for each parallel computer, we aimed to explore an innovative, portable, parallel compiler approach that can automatically self-adapt to any heterogeneous hardware and can improve its performance over time. This has been achieved by employing machine learning approaches which first learn the optimization space off-line and then automatically derive a strategy that attempts to generate the best mapping for any user program. For our project we focused on OpenCL which is the first open standard for cross-platform parallel computing. Writing programs for heterogeneous systems is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational resources. Our research conducted as part of this project, included load balancing of OpenCL tasks over heterogeneous compute nodes. It is important to understand, that the best performing task partitioning commonly is application, input data and machine specific. We developed a new solution based on machine learning for which we demonstrated a performance improvement of approximately 25 % over previous approaches based on a large variety of test codes. A crucial component of this solution was a prediction model which has been created based on artificial neural networks and principal component analysis that tries to estimate an efficient task partitioning and thereby guides our compiler-based optimization. This approach has been implemented as part of the Insieme source to source compiler as developed and maintained by the University of Innsbruck. Furthermore, we - for the first time - extended OpenCL to work also for clusters of GPUs and achieved an efficiency of 64 % for a variety of codes. The main development goals were to achieve performance improvement and productivity by simplifying program development. This project aimed at exploring multi-core system software that is future scalable and has a wide range of applications from high performance computing, to desktops, to embedded mobile devices. It allows portable performance of parallel programs across platforms. Given the increasing prevalence of heterogeneous multi-cores on a large variety of computers, our work and results are a good starting point to effectively use heterogeneous computers with reduced execution times and increased productivity for parallel programs.

Research institution(s)

Universität Innsbruck - 100%

Research Output

323 Citations
18 Publications

Publications

Title	An automatic input-sensitive approach for heterogeneous task partitioning
DOI	10.1145/2464996.2465007
Type	Conference Proceeding Abstract
Author	Kofler K
Pages	149-160

Title	The JavaSymphony Extensions for Parallel GPU Computing
DOI	10.1109/icpp.2012.56
Type	Conference Proceeding Abstract
Author	Aleem M
Pages	30-39

Title	Automatic OpenMP Loop Scheduling: A Combined Compiler and Runtime Approach
DOI	10.1007/978-3-642-30961-8_7
Type	Book Chapter
Author	Thoman P
Publisher	Springer Nature
Pages	88-101

Title	OpenMP in a Heterogeneous World, 8th International Workshop on OpenMP, IWOMP 2012, Rome, Italy, June 11-13, 2012. Proceedings
DOI	10.1007/978-3-642-30961-8
Type	Book
Publisher	Springer Nature

Title	SAMPO: an Agent-based mosquito point model in OpenCL.
Type	Conference Proceeding Abstract
Author	Gesing S Et Al
Conference	Proceedings of the 2014 Symposium on Agent Directed Simulation.

Title	Random Fields Generation on the GPU with the Spectral Turning Bands Method
DOI	10.1007/978-3-319-09873-9_55
Type	Book Chapter
Author	Hunger L
Publisher	Springer Nature
Pages	656-667

Title	A Multi-Objective Auto-Tuning Framework for Parallel Codes
DOI	10.1109/sc.2012.7
Type	Conference Proceeding Abstract
Author	Jordan H
Pages	1-12

Title	Visual Data Mining Using the Point Distribution Tensor.
Type	Conference Proceeding Abstract
Author	Leimer W Et Al

Title	A multi-objective auto-tuning framework for parallel codes.
Type	Conference Proceeding Abstract
Author	Moritsch H Et Al

Title	Low-Latency Collectives for the Intel SCC
DOI	10.1109/cluster.2012.58
Type	Conference Proceeding Abstract
Author	Kohler A
Pages	346-354

Title	Tuning MPI Runtime Parameter Setting for High Performance Computing
DOI	10.1109/clusterw.2012.15
Type	Conference Proceeding Abstract
Author	Pellegrini S
Pages	213-221

Title	Automatic problem size sensitive task partitioning on heterogeneous parallel systems
DOI	10.1145/2442516.2442545
Type	Conference Proceeding Abstract
Author	Grasso I
Pages	281-282

Title	Kd-tree based N-Body Simulations with Volume-Mass Heuristic on the GPU
DOI	10.1109/ipdpsw.2014.141
Type	Conference Proceeding Abstract
Author	Kofler K
Pages	1256-1265

Title	Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU
DOI	10.1109/ipdps.2014.24
Type	Conference Proceeding Abstract
Author	Grasso I
Pages	123-132

Title	A uniform approach for programming distributed heterogeneous computing systems
DOI	10.1016/j.jpdc.2014.08.002
Type	Journal Article
Author	Grasso I
Journal	Journal of Parallel and Distributed Computing
Pages	3228-3239
Link	Publication

Title	Performance Analysis and Benchmarking of the Intel SCC
DOI	10.1109/cluster.2011.24
Type	Conference Proceeding Abstract
Author	Gschwandtner P
Pages	139-149

Title	LibWater
DOI	10.1145/2464996.2465008
Type	Conference Proceeding Abstract
Author	Grasso I
Pages	161-172

Title	LibWater: heterogeneous distributed computing made easy.
Type	Conference Proceeding Abstract
Author	Fahringer T Et Al

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Automatic Portable Performance for Heterogeneous Multi-cores

Automatic Portable Performance for Heterogeneous Multi-cores

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Automatic Portable Performance for Heterogeneous Multi-cores

Automatic Portable Performance for Heterogeneous Multi-cores

Disciplines

Keywords

Research Output