Projectdetail

Grant DOI 10.55776/P29783
Funding program Principal Investigator Projects
Status ended
Start April 1, 2017
End March 31, 2021
Funding amount € 317,835
Project website
E-mail

Disciplines

Computer Sciences (100%)

Keywords

Runtime Systems, Parallel Programming, Parallel Architectures, High Performance Computing, Multicore Architectures

Abstract

Final report

Over the last decade the architecture of computing systems has undergone significant changes. Multi- core processors are now ubiquitous, and only software that is properly parallelized will run faster on new processor generations. The need for more energy efficient systems is now resulting in another disruptive technology shift towards heterogeneous multi-core systems, which combine different types of execution units with often dynamically varying performance characteristics. Heterogeneous multicore architectures add another level of complexity to the development of efficient parallel software only a diminishing number of expert programmers is capable of coping with. Given the rapid innovation cycles in processor architectures as well as the complexity and wide variety of heterogeneous parallel systems, the current approach of manually adapting and optimizing applications for different parallel architectures leads to very high costs for software and thus becomes unsustainable. Thus, a new approach to the development of parallel applications is needed where programmers only specify the potential parallelism in their programs but not how parallel execution is implemented on a specific system. Key to such a new approach is a dynamic runtime system that is capable of adapting programs at runtime to a specific architecture. In this project our goal is to develop new methods for dynamic runtime systems to facilitate the development of efficient applications for current and future heterogeneous parallel architectures and to improve performance portability across different types of heterogeneous systems. Our work will be based on the Open Community Runtime (OCR), an open specification of a dynamic runtime system, developed by leading US research institutions. We will develop new runtime techniques that support dynamic partitioning and mapping of both computations and data and that provide the ability to dynamically re-balance parallel applications in response to changing machine characteristics and application workloads. A special focus will be on techniques that improve data locality, since the performance of applications is limited by the cost of data movement. The research into dynamic runtime systems and heterogeneous parallel systems will be relevant across the whole spectrum of computing systems, from mobile and embedded systems all the way to high-end systems and supercomputers. Our developments will help to accelerate the development of much-needed higher-level programming models and domain-specific languages and to unlock the tremendous performance potential of future parallel systems for new application areas.

Recent developments of computer architectures have been characterized by a continued increase in parallelism, specialization, and heterogeneity, resulting in extremely complex architectures only a diminishing number of expert programmers are capable of coping with. Given the rapid innovation cycles as well as the complexity and wide variety of heterogeneous parallel systems, the still prevail ing approach of manually adapting and optimizing applications for different parallel architectures leads to very high costs for software and thus has become unsustainable. Addressing these issues, the overarching goal of this project was to investigate, design and implement new programming methods and tools to simplify the development of efficient yet portable applications for current and future parallel systems. Our approach is based on separating the specification of parallelism from its concrete implementation on a specific target architecture. Key to this approach is a task-based dynamic runtime system that is capable of adapting programs at runtime to a specific architecture. Major research objectives addressed in the course of the project included runt ime support for heterogeneous architectures, new locality-aware scheduling strategies for architectures with complex memory hierarchies, development of platform descriptors to facilitate program adaptation for different target architectures, mechanisms for adapting task granularity, and high-level programming abstractions. Given the increasing importance of data analytics applications and their coupling with traditional HPC simulations, we also explored new strategies for optimizing collocated applications through adaptive scheduling strategies. We believe that task-based runtime systems provide the required flexibility for efficiently realizing future application scenarios such as, for example, in-situ data analytics and coupled simulation/machine learning. Our research contributed to the advancement of the state-of-the-art in programming support for current and future parallel architectures and HPC systems. Although our research was mainly done in the context of the Open Community Runtime (OCR) and our open-source implementation OCR-Vx, our work is applicable to similar systems developed in Europe and the US. Given the continuously increasing complexity of emerging processor architectures, the insatiable demand for computational power, and the convergence of high performance computing with data analytics and machine learning, new methods for efficient and flexible software as offered by task -based runtimes have become increasingly important. We believe that our project has made significant contributions to these developments with many opportunities for future work along these lines. Our research results provide evidence to the hypothesis that asynchronous, task -based runtimes and associated programming systems are a promising way forward to the development of new programming models and application software that can seamlessly take full advantage of future processor architectures.

Research institution(s)

Universität Wien - 100%

Research Output

49 Citations
16 Publications
1 Policies
1 Methods & Materials
1 Datasets & models
1 Software
1 Disseminations
1 Fundings

Publications

Title	Task-Based Performance Portability in HPC
DOI	10.5281/zenodo.5549731
Author	Aumage O
Link	Publication

Title	Task-Based Performance Portability in HPC
DOI	10.5281/zenodo.5549730
Author	Aumage O
Link	Publication

Title	NUMA-aware CPU core allocation in cooperating dynamic applications
DOI	10.1109/ipdpsw50202.2020.00158
Type	Conference Proceeding Abstract
Author	Dokulil J
Pages	950-957

Title	Consistency model for runtime objects in the Open Community Runtime
DOI	10.1007/s11227-018-2681-2
Type	Journal Article
Author	Dokulil J
Journal	The Journal of Supercomputing
Pages	2725-2760
Link	Publication

Title	Automatic Detection of Synchronization Errors in Codes that Target the Open Community Runtime
DOI	10.1007/978-3-319-96983-1_1
Type	Book Chapter
Author	Dokulil J
Publisher	Springer Nature
Pages	3-15

Title	Adaptive Scheduling of Collocated Applications Using a Task-Based Runtime System
DOI	10.1109/cahpc.2018.8645869
Type	Conference Proceeding Abstract
Author	Dokulil J
Pages	41-48

Title	Visualization of Open Community Runtime Task Graphs
DOI	10.1109/iv.2017.31
Type	Conference Proceeding Abstract
Author	Dokulil J
Pages	236-241

Title	Extending the Open Community Runtime with External Application Support
DOI	10.1145/3152041.3152088
Type	Conference Proceeding Abstract
Author	Dokulil J
Pages	1-7

Title	Pipeline Patterns on Top of Task-Based Runtimes
DOI	10.1007/978-981-13-5907-1_11
Type	Book Chapter
Author	Bajrovic E
Publisher	Springer Nature
Pages	100-110

Title	The OCR-Vx experience: lessons learned from designing and implementing a task-based runtime system
DOI	10.1007/s11227-022-04355-0
Type	Journal Article
Author	Dokulil J
Journal	The Journal of Supercomputing
Pages	12344-12379
Link	Publication

Title	The Open Community Runtime on the Intel Knights Landing Architecture
DOI	10.1007/978-3-319-65482-9_65
Type	Book Chapter
Author	Dokulil J
Publisher	Springer Nature
Pages	801-813

Title	Exploring the Performance of Fine-Grained Synchronization and Data Exchange Across Process Boundaries on Modern Multi-core Architectures
DOI	10.1007/978-3-030-22750-0_45
Type	Book Chapter
Author	Dokulil J
Publisher	Springer Nature
Pages	514-520

Title	A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit
DOI	10.1016/j.future.2020.02.069
Type	Journal Article
Author	Petrovic F
Journal	Future Generation Computer Systems
Pages	161-177
Link	Publication

Title	clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters
DOI	10.1007/s11227-020-03234-w
Type	Journal Article
Author	Raca V
Journal	The Journal of Supercomputing
Pages	9976-10008

Title	Automatic Placement of Tasks to NUMA Nodes in Iterative Applications
DOI	10.1109/pdp50117.2020.00036
Type	Conference Proceeding Abstract
Author	Dokulil J
Pages	192-195

Title	Let’s Put the Memory Model Front and Center When Teaching Parallel Programming in C++
DOI	10.1109/ipdpsw52791.2021.00057
Type	Conference Proceeding Abstract
Author	Dokulil J
Pages	315-320

Policies

Title	White paper ETP4HPC
Type	Participation in a guidance/advisory committee

Methods & Materials

Public Access
Title	OCR-Vx Software
Type	Improvements to research infrastructure
Link	Link

Datasets & models

Public Access
Title	Task-based programming models
Type	Computer model/algorithm
Link	Link

Software

Title	OCR-Vx runtime system
Link	Link

Disseminations

Title	White paper for European Technology Platform for High Performance Computing
Type	A magazine, newsletter or online publication
Link	Link

Fundings

Title	Offline and Online Autotuning of Parallel Applications
Type	Other
Start of Funding	2021
Funder	Austrian Science Fund (FWF)

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

DYNAMIC RUNTIME SYSTEM FOR FUTURE PARALLEL ARCHITECTURES

DYNAMIC RUNTIME SYSTEM FOR FUTURE PARALLEL ARCHITECTURES

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

DYNAMIC RUNTIME SYSTEM FOR FUTURE PARALLEL ARCHITECTURES

DYNAMIC RUNTIME SYSTEM FOR FUTURE PARALLEL ARCHITECTURES

Disciplines

Keywords

Research Output