Projectdetail

Disciplines

Computer Sciences (40%); Mathematics (60%)

Keywords

Process Mapping, Graph Algorithms, Algorithms Engineering, Distributed And Parallel Algorithms

Abstract

Final report

Communication performance between application processes in high-performance computing systems depends on many factors like the capability and topology of the underlying communication system, the required communication between processes in the given applications, and the software and algorithms used to realize the communication. For example communication is typically faster if communicating processes are located on the same physical processor node compared to the cases where processes reside on different nodes. This becomes even more pronounced for large supercomputer systems where processors are hierarchically organized, e.g. islands, racks, nodes, ..., with corresponding communication links of similar quality, and where differences in process placement can make huge differences in communication performance. Given the communication pattern between processes and a hardware topology description that reflects the quality of the communication links, one hence seeks to find a good mapping of processes onto processors such that pairs of processes that exchange large amounts of data become located closely. This project will be foremost concerned with algorithm engineering of process mapping algorithms that improve known methods especially for large computing systems and large applications. The project will exploit two crucial assumptions that are typically valid for modern supercomputers and the applications that run on those. First, we assume that communication patterns are sparse since not all processes have to communicate with each other. Secondly, we assume that the hardware communication topology under consideration is hierarchical with communication links on the same level having the same communication performance. Resulting algorithms and implementations will be more robust, more flexible, produce better solutions, and scale to massively parallel machines and instances much larger than previously possible. We envision to arrive at more powerful algorithms for process mapping and sparse quadratic assignment problems. The most successful implementations will be provided as easy-to-use open source software. While centered around variations of the process mapping problem as a quadratic assignment problem, the project will explore other formulations of the problem as well.

Modern computing systems, especially at a large scale, are, even if the individual processors in the system are of the same kind, highly non-homogeneous. Memory access times depend on where a process is being executed and on which memory addresses are accessed, and can vary highly, also depending on the overall load on the system. Communication times between processes are likewise highly dependent on the process placement, the overall communication patterns, and the overall load on the system. The aim of the project was to explore and mitigate such effects for parallel applications on large, high-performance computing systems, typically programmed with the Message-Passing Interface MPI. For individual applications, MPI provides for structured use of communication resources by means of isolated communication domains (communicators) and explicit process remapping using specified communication patterns provided by the application. The project initially explored a traditional process mapping problem for MPI communicators and provided new heuristics for mapping of structured, so called Cartesian patterns to typical hierarchically clustered systems. In order to better understand the impact of such improved mappings, and to be able to explore in greater detail than usually done the usage of communication domains in applications, a new profiling tool with new and uncommon features was developed and made available, and subsequently used in the project. Finally, the effects of mappings of different applications running at the same time and co-located on the given system were studied in detail, showing that efficient mappings of the individual applications can lead to improved and better overall usage of the whole system.

Research institution(s)

Technische Universität Wien - 100%

International project participants

Henning Meyerhenke, Karlsruhe Institue for Technology - Germany
Mathias J. Krause, Universität Karlsruhe - Germany
Peter Sanders, Universität Karlsruhe - Germany
Torsten Hoefler, Eidgenössische Technische Hochschule Zürich - Switzerland
William D Gropp, University of Illinois - USA

Research Output

35 Citations
20 Publications
1 Scientific Awards

Publications

Title	Exploring Mapping Strategies for Co-allocated HPC Applications
DOI	10.1007/978-3-031-48803-0_31
Type	Book Chapter
Author	Vardas I
Publisher	Springer Nature
Pages	271-276
Link	Publication

Title	Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI
DOI	10.1109/cluster52292.2023.00031
Type	Conference Proceeding Abstract
Author	Träff J
Pages	284-294

Title	Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures
DOI	10.1145/3624062.3624109
Type	Conference Proceeding Abstract
Author	Swartvagher P
Pages	405-415
Link	Publication

Title	Scalable Graph Algorithms
DOI	10.48550/arxiv.1912.00245
Type	Preprint
Author	Schulz C
Link	Publication

Title	An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical Architectures
DOI	10.48550/arxiv.2107.02539
Type	Preprint
Author	Predari M
Link	Publication

Title	An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical Architectures; In: Euro-Par 2021: Parallel Processing - 27th International Conference on Parallel and Distributed Computing, Lisbon, Portugal, September 1-3, 2021, Proceedings
DOI	10.1007/978-3-030-85665-6_11
Type	Book Chapter
Publisher	Springer International Publishing

Title	Efficient Process-to-Node Mapping Algorithms for Stencil Computations
DOI	10.1109/cluster49012.2020.00011
Type	Conference Proceeding Abstract
Author	Von Kirchbach K
Pages	1-11
Link	Publication

Title	MPI collective communication through a single set of interfaces: A case for orthogonality
DOI	10.1016/j.parco.2021.102826
Type	Journal Article
Author	Träff J
Journal	Parallel Computing
Pages	102826
Link	Publication

Title	Mpisee: Communicator-Centric Profiling of MPI Applications
DOI	10.1002/cpe.70158
Type	Journal Article
Author	Vardas I
Journal	Concurrency and Computation: Practice and Experience
Link	Publication

Title	An Overhead Analysis of MPI Profiling and Tracing Tools
DOI	10.1145/3526063.3535353
Type	Conference Proceeding Abstract
Author	Hunold S
Pages	5-13
Link	Publication

Title	mpisee: MPI Profiling for Communication and Communicator Structure
DOI	10.1109/ipdpsw55747.2022.00092
Type	Conference Proceeding Abstract
Author	Vardas I
Pages	520-529

Title	Modes, Persistence and Orthogonality: Blowing MPI Up
DOI	10.1109/scw63240.2024.00061
Type	Conference Proceeding Abstract
Author	Träff J
Pages	404-413

Title	Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping
DOI	10.1109/ccgrid59990.2024.00023
Type	Conference Proceeding Abstract
Author	Vardas I
Pages	119-124

Title	Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes
DOI	10.1145/3615318.3615323
Type	Conference Proceeding Abstract
Author	Träff J
Pages	1-10

Title	Load-Balanced Bottleneck Objectives in Process Mapping
DOI	10.5445/ir/1000117914
Type	Other
Author	Langguth J
Link	Publication

Title	Load-Balanced Bottleneck Objectives in Process Mapping
DOI	10.48550/arxiv.2001.09645
Type	Preprint
Author	Langguth J
Link	Publication

Title	High-Quality Hierarchical Process Mapping
DOI	10.48550/arxiv.2001.07134
Type	Preprint
Author	Faraj M
Link	Publication

Title	Better Process Mapping and Sparse Quadratic Assignment
DOI	10.1145/3409667
Type	Journal Article
Author	Von Kirchbach K
Journal	Journal of Experimental Algorithmics (JEA)
Pages	1-19
Link	Publication

Title	High-Quality Hierarchical Process Mapping
DOI	10.4230/lipics.sea.2020.4
Type	Conference Proceeding Abstract
Author	Faraj M
Conference	LIPIcs, Volume 160, SEA 2020
Pages	4:1 - 4:15
Link	Publication

Title	Efficient Process-to-Node Mapping Algorithms for Stencil Computations
DOI	10.48550/arxiv.2005.09521
Type	Preprint
Author	Hunold S
Link	Publication

Scientific Awards

Title	Best paper award, IEEE Cluster 2020 conference
Type	Poster/abstract prize
Level of Recognition	Continental/International

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Algorithm Engineering for Process Mapping

Algorithm Engineering for Process Mapping

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Algorithm Engineering for Process Mapping

Algorithm Engineering for Process Mapping

Disciplines

Keywords

Research Output