DYNAMIC RUNTIME SYSTEM FOR FUTURE PARALLEL ARCHITECTURES
DYNAMIC RUNTIME SYSTEM FOR FUTURE PARALLEL ARCHITECTURES
Disciplines
Computer Sciences (100%)
Keywords
-
Runtime Systems,
Parallel Programming,
Parallel Architectures,
High Performance Computing,
Multicore Architectures
Over the last decade the architecture of computing systems has undergone significant changes. Multi- core processors are now ubiquitous, and only software that is properly parallelized will run faster on new processor generations. The need for more energy efficient systems is now resulting in another disruptive technology shift towards heterogeneous multi-core systems, which combine different types of execution units with often dynamically varying performance characteristics. Heterogeneous multicore architectures add another level of complexity to the development of efficient parallel software only a diminishing number of expert programmers is capable of coping with. Given the rapid innovation cycles in processor architectures as well as the complexity and wide variety of heterogeneous parallel systems, the current approach of manually adapting and optimizing applications for different parallel architectures leads to very high costs for software and thus becomes unsustainable. Thus, a new approach to the development of parallel applications is needed where programmers only specify the potential parallelism in their programs but not how parallel execution is implemented on a specific system. Key to such a new approach is a dynamic runtime system that is capable of adapting programs at runtime to a specific architecture. In this project our goal is to develop new methods for dynamic runtime systems to facilitate the development of efficient applications for current and future heterogeneous parallel architectures and to improve performance portability across different types of heterogeneous systems. Our work will be based on the Open Community Runtime (OCR), an open specification of a dynamic runtime system, developed by leading US research institutions. We will develop new runtime techniques that support dynamic partitioning and mapping of both computations and data and that provide the ability to dynamically re-balance parallel applications in response to changing machine characteristics and application workloads. A special focus will be on techniques that improve data locality, since the performance of applications is limited by the cost of data movement. The research into dynamic runtime systems and heterogeneous parallel systems will be relevant across the whole spectrum of computing systems, from mobile and embedded systems all the way to high-end systems and supercomputers. Our developments will help to accelerate the development of much-needed higher-level programming models and domain-specific languages and to unlock the tremendous performance potential of future parallel systems for new application areas.
Recent developments of computer architectures have been characterized by a continued increase in parallelism, specialization, and heterogeneity, resulting in extremely complex architectures only a diminishing number of expert programmers are capable of coping with. Given the rapid innovation cycles as well as the complexity and wide variety of heterogeneous parallel systems, the still prevail ing approach of manually adapting and optimizing applications for different parallel architectures leads to very high costs for software and thus has become unsustainable. Addressing these issues, the overarching goal of this project was to investigate, design and implement new programming methods and tools to simplify the development of efficient yet portable applications for current and future parallel systems. Our approach is based on separating the specification of parallelism from its concrete implementation on a specific target architecture. Key to this approach is a task-based dynamic runtime system that is capable of adapting programs at runtime to a specific architecture. Major research objectives addressed in the course of the project included runt ime support for heterogeneous architectures, new locality-aware scheduling strategies for architectures with complex memory hierarchies, development of platform descriptors to facilitate program adaptation for different target architectures, mechanisms for adapting task granularity, and high-level programming abstractions. Given the increasing importance of data analytics applications and their coupling with traditional HPC simulations, we also explored new strategies for optimizing collocated applications through adaptive scheduling strategies. We believe that task-based runtime systems provide the required flexibility for efficiently realizing future application scenarios such as, for example, in-situ data analytics and coupled simulation/machine learning. Our research contributed to the advancement of the state-of-the-art in programming support for current and future parallel architectures and HPC systems. Although our research was mainly done in the context of the Open Community Runtime (OCR) and our open-source implementation OCR-Vx, our work is applicable to similar systems developed in Europe and the US. Given the continuously increasing complexity of emerging processor architectures, the insatiable demand for computational power, and the convergence of high performance computing with data analytics and machine learning, new methods for efficient and flexible software as offered by task -based runtimes have become increasingly important. We believe that our project has made significant contributions to these developments with many opportunities for future work along these lines. Our research results provide evidence to the hypothesis that asynchronous, task -based runtimes and associated programming systems are a promising way forward to the development of new programming models and application software that can seamlessly take full advantage of future processor architectures.
- Universität Wien - 100%
Research Output
- 49 Citations
- 16 Publications
- 1 Policies
- 1 Methods & Materials
- 1 Datasets & models
- 1 Software
- 1 Disseminations
- 1 Fundings
-
2021
Title Task-Based Performance Portability in HPC DOI 10.5281/zenodo.5549731 Author Aumage O Link Publication -
2021
Title Task-Based Performance Portability in HPC DOI 10.5281/zenodo.5549730 Author Aumage O Link Publication -
2020
Title NUMA-aware CPU core allocation in cooperating dynamic applications DOI 10.1109/ipdpsw50202.2020.00158 Type Conference Proceeding Abstract Author Dokulil J Pages 950-957 -
2018
Title Consistency model for runtime objects in the Open Community Runtime DOI 10.1007/s11227-018-2681-2 Type Journal Article Author Dokulil J Journal The Journal of Supercomputing Pages 2725-2760 Link Publication -
2018
Title Automatic Detection of Synchronization Errors in Codes that Target the Open Community Runtime DOI 10.1007/978-3-319-96983-1_1 Type Book Chapter Author Dokulil J Publisher Springer Nature Pages 3-15 -
2018
Title Adaptive Scheduling of Collocated Applications Using a Task-Based Runtime System DOI 10.1109/cahpc.2018.8645869 Type Conference Proceeding Abstract Author Dokulil J Pages 41-48 -
2017
Title Visualization of Open Community Runtime Task Graphs DOI 10.1109/iv.2017.31 Type Conference Proceeding Abstract Author Dokulil J Pages 236-241 -
2017
Title Extending the Open Community Runtime with External Application Support DOI 10.1145/3152041.3152088 Type Conference Proceeding Abstract Author Dokulil J Pages 1-7 -
2019
Title Pipeline Patterns on Top of Task-Based Runtimes DOI 10.1007/978-981-13-5907-1_11 Type Book Chapter Author Bajrovic E Publisher Springer Nature Pages 100-110 -
2022
Title The OCR-Vx experience: lessons learned from designing and implementing a task-based runtime system DOI 10.1007/s11227-022-04355-0 Type Journal Article Author Dokulil J Journal The Journal of Supercomputing Pages 12344-12379 Link Publication -
2017
Title The Open Community Runtime on the Intel Knights Landing Architecture DOI 10.1007/978-3-319-65482-9_65 Type Book Chapter Author Dokulil J Publisher Springer Nature Pages 801-813 -
2019
Title Exploring the Performance of Fine-Grained Synchronization and Data Exchange Across Process Boundaries on Modern Multi-core Architectures DOI 10.1007/978-3-030-22750-0_45 Type Book Chapter Author Dokulil J Publisher Springer Nature Pages 514-520 -
2020
Title A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit DOI 10.1016/j.future.2020.02.069 Type Journal Article Author Petrovic F Journal Future Generation Computer Systems Pages 161-177 Link Publication -
2020
Title clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters DOI 10.1007/s11227-020-03234-w Type Journal Article Author Raca V Journal The Journal of Supercomputing Pages 9976-10008 -
2020
Title Automatic Placement of Tasks to NUMA Nodes in Iterative Applications DOI 10.1109/pdp50117.2020.00036 Type Conference Proceeding Abstract Author Dokulil J Pages 192-195 -
2021
Title Let’s Put the Memory Model Front and Center When Teaching Parallel Programming in C++ DOI 10.1109/ipdpsw52791.2021.00057 Type Conference Proceeding Abstract Author Dokulil J Pages 315-320
-
2021
Title White paper ETP4HPC Type Participation in a guidance/advisory committee
-
2021
Title Offline and Online Autotuning of Parallel Applications Type Other Start of Funding 2021 Funder Austrian Science Fund (FWF)