CELERITY: advanCed modELing for scalabIE distRIbuted runTime SYstems
CELERITY: advanCed modELing for scalabIE distRIbuted runTime SYstems
DACH: Österreich - Deutschland - Schweiz
Disciplines
Computer Sciences (100%)
Keywords
-
High Performance Computing,
Program Optimization
High-Performance Computing (HPC) plays a fundamental role in enabling scientific progress, as improvements in many areas of science critically depend on advances in computational modeling and processing power. The next major milestone for the HPC community is the transition from peta to exascale, which imposes difficult research challenges such as programming models for scientific productivity, scalable system software design, and energy efficiency. We propose the CELERITY environment to support the effective development of energy- and performance-efficient, predictably scalable and easy-to-program parallel applications targeting large-scale homogenous and heterogeneous HPC clusters. The CELERITY environment, thanks to its high-level programming model, assures high productivity by relieving the programmer of labor-intensive low-level concerns such as task partitioning and distribution -- which are particularly demanding when targeting heterogeneous distributed architectures. To provide high performance, CELERITY will combine novel static kernel analyses integrated with dynamic information provided by the runtime system, enabling modeling and predictions of parallel scalability as well as energy consumption for a given input application. The modeling will be based on advanced machine learning methodologies such as structural learning, leveraging the inner representation of the data to deliver accurate predictions. We plan to assess our system first on a small- scale heterogeneous cluster instrumented with very accurate energy measurement devices, and successively on several large-scale clusters with the available instrumentation provided by the computing infrastructures, using a broad collection of application benchmarks.
The Celerity project aims to provide an efficient, high-level platform for developing and running high-performance computing (HPC) applications on parallel computers with accelerator hardware. Existing programming approaches are limited to single accelerators, vendor-controlled APIs or low-level abstractions that combine multiple existing APIs. While this approach can achieve good performance, it requires developer expertise in both distributed memory parallel programming and accelerator computing, greatly increases development and maintenance efforts, and often suffers from limited performance portability. Celerity proposes a minimal set of extensions to SYCL which allows programs to target clusters of accelerators with only very minor modifications to a single-node SYCL program. Celerity uses SYCL which is a single-source, modern C++ for targeting heterogeneous parallel architectures. We have demonstrated that the Celerity API achieves a reduction by more than 50% in terms of developing a code for accelerator-based parallel computers compared to conventional approaches based on coupled APIs. While the SYCL-based Celerity API leads to significantly less complex code than a traditional approach that couple multiple APIs, it still requires developers to be familiar with the general concepts of accelerator development and all associated programming complexity. As an alternative at an even higher level of abstraction we designed and developed a high-level API which transparently maps from a functional stream processing specification to the Celerity API. The Celerity runtime system reduces the runtime of Celerity programs by leveraging data structures that allow the runtime to make work division and distribution decisions while keeping track of data location and dependencies. As part of Celerity we collaborate with a partner group from the University of Salerno which leveraged machine learning to predict the performance of various work distribution strategies to guide the optimization of the runtime of Celerity programs. Overall Celerity advanced the state-of-the-art in programming and runtime system techniques for performance-oriented programs that can benefit from parallel computers with a large number of accelerator hardware. Although existing approaches can achieve very good performance but at the cost of more complex development effort compared to Celerity.
- Universität Innsbruck - 100%
Research Output
- 89 Citations
- 16 Publications
- 1 Policies
- 1 Fundings
-
2026
Title A Portable Compiler-Runtime Approach for Scalability Prediction DOI 10.1016/j.future.2025.108337 Type Journal Article Author Lal S Journal Future Generation Computer Systems -
2024
Title Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs DOI 10.1007/s10766-024-00767-y Type Journal Article Author Knorr F Journal International Journal of Parallel Programming -
2024
Title Productivity and Performance in Heterogeneous Parallel Computing Type Postdoctoral Thesis Author Peter Thoman -
2021
Title ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs DOI 10.5281/zenodo.7437645 Type Preprint Author Knorr F Link Publication -
2021
Title ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs DOI 10.5281/zenodo.7437646 Type Preprint Author Knorr F Link Publication -
2022
Title Declarative Data Flow in a Graph-Based Distributed Memory Runtime System DOI 10.1007/s10766-022-00743-4 Type Journal Article Author Knorr F Journal International Journal of Parallel Programming Pages 150-171 Link Publication -
2021
Title ndzip: A High-Throughput Parallel Lossless Compressor for Scientific Data DOI 10.1109/dcc50243.2021.00018 Type Conference Proceeding Abstract Author Knorr F Pages 103-112 Link Publication -
2021
Title Sylkan: Towards a Vulkan Compute Target Platform for SYCL DOI 10.1145/3456669.3456683 Type Conference Proceeding Abstract Author Thoman P Pages 1-12 Link Publication -
2021
Title Porting Real-World Applications to GPU Clusters: A Celerity and Cronos Case Study DOI 10.1109/escience51609.2021.00019 Type Conference Proceeding Abstract Author Gschwandtner P Pages 90-98 -
2022
Title The Celerity High-level API: C++20 for Accelerator Clusters DOI 10.1007/s10766-022-00731-8 Type Journal Article Author Thoman P Journal International Journal of Parallel Programming Pages 341-359 Link Publication -
2020
Title RTX-RSim DOI 10.1145/3388333.3388662 Type Conference Proceeding Abstract Author Thoman P Pages 1-11 -
2020
Title SYCL-Bench DOI 10.1145/3388333.3388669 Type Conference Proceeding Abstract Author Lal S Pages 1-1 -
2020
Title SYCL-Bench: A Versatile Cross-Platform Benchmark Suite for Heterogeneous Computing; In: Euro-Par 2020: Parallel Processing - 26th International Conference on Parallel and Distributed Computing, Warsaw, Poland, August 24-28, 2020, Proceedings DOI 10.1007/978-3-030-57675-2_39 Type Book Chapter Publisher Springer International Publishing -
2021
Title ndzip-gpu DOI 10.1145/3458817.3476224 Type Conference Proceeding Abstract Author Knorr F Pages 1-14 Link Publication -
2019
Title Celerity: High-Level C++ for Accelerator Clusters DOI 10.1007/978-3-030-29400-7_21 Type Book Chapter Author Thoman P Publisher Springer Nature Pages 291-303 -
0
DOI 10.1145/3458817 Type Other
-
2021
Title European High-Performance Computing Joint Undertaking Type Research grant (including intramural programme) Start of Funding 2021 Funder European Union