Algorithm Engineering for Process Mapping
Algorithm Engineering for Process Mapping
Disciplines
Computer Sciences (40%); Mathematics (60%)
Keywords
-
Process Mapping,
Graph Algorithms,
Algorithms Engineering,
Distributed And Parallel Algorithms
Communication performance between application processes in high-performance computing systems depends on many factors like the capability and topology of the underlying communication system, the required communication between processes in the given applications, and the software and algorithms used to realize the communication. For example communication is typically faster if communicating processes are located on the same physical processor node compared to the cases where processes reside on different nodes. This becomes even more pronounced for large supercomputer systems where processors are hierarchically organized, e.g. islands, racks, nodes, ..., with corresponding communication links of similar quality, and where differences in process placement can make huge differences in communication performance. Given the communication pattern between processes and a hardware topology description that reflects the quality of the communication links, one hence seeks to find a good mapping of processes onto processors such that pairs of processes that exchange large amounts of data become located closely. This project will be foremost concerned with algorithm engineering of process mapping algorithms that improve known methods especially for large computing systems and large applications. The project will exploit two crucial assumptions that are typically valid for modern supercomputers and the applications that run on those. First, we assume that communication patterns are sparse since not all processes have to communicate with each other. Secondly, we assume that the hardware communication topology under consideration is hierarchical with communication links on the same level having the same communication performance. Resulting algorithms and implementations will be more robust, more flexible, produce better solutions, and scale to massively parallel machines and instances much larger than previously possible. We envision to arrive at more powerful algorithms for process mapping and sparse quadratic assignment problems. The most successful implementations will be provided as easy-to-use open source software. While centered around variations of the process mapping problem as a quadratic assignment problem, the project will explore other formulations of the problem as well.
Modern computing systems, especially at a large scale, are, even if the individual processors in the system are of the same kind, highly non-homogeneous. Memory access times depend on where a process is being executed and on which memory addresses are accessed, and can vary highly, also depending on the overall load on the system. Communication times between processes are likewise highly dependent on the process placement, the overall communication patterns, and the overall load on the system. The aim of the project was to explore and mitigate such effects for parallel applications on large, high-performance computing systems, typically programmed with the Message-Passing Interface MPI. For individual applications, MPI provides for structured use of communication resources by means of isolated communication domains (communicators) and explicit process remapping using specified communication patterns provided by the application. The project initially explored a traditional process mapping problem for MPI communicators and provided new heuristics for mapping of structured, so called Cartesian patterns to typical hierarchically clustered systems. In order to better understand the impact of such improved mappings, and to be able to explore in greater detail than usually done the usage of communication domains in applications, a new profiling tool with new and uncommon features was developed and made available, and subsequently used in the project. Finally, the effects of mappings of different applications running at the same time and co-located on the given system were studied in detail, showing that efficient mappings of the individual applications can lead to improved and better overall usage of the whole system.
- Technische Universität Wien - 100%
- Henning Meyerhenke, Karlsruhe Institue for Technology - Germany
- Mathias J. Krause, Universität Karlsruhe - Germany
- Peter Sanders, Universität Karlsruhe - Germany
- Torsten Hoefler, Eidgenössische Technische Hochschule Zürich - Switzerland
- William D Gropp, University of Illinois - USA
Research Output
- 26 Citations
- 19 Publications
- 1 Scientific Awards
-
2024
Title Exploring Mapping Strategies forCo-allocated HPC Applications; In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 - September 1, 2023, Revised Selected Papers, Part II DOI 10.1007/978-3-031-48803-0_31 Type Book Chapter Publisher Springer Nature Switzerland -
2024
Title Modes, Persistence and Orthogonality: Blowing MPI Up DOI 10.1109/scw63240.2024.00061 Type Conference Proceeding Abstract Author Träff J Pages 404-413 -
2024
Title Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping DOI 10.1109/ccgrid59990.2024.00023 Type Conference Proceeding Abstract Author Hunold S Pages 119-124 -
2022
Title An Overhead Analysis of MPI Profiling and Tracing Tools DOI 10.1145/3526063.3535353 Type Conference Proceeding Abstract Author Hunold S Pages 5-13 Link Publication -
2022
Title mpisee: MPI Profiling for Communication and Communicator Structure DOI 10.1109/ipdpsw55747.2022.00092 Type Conference Proceeding Abstract Author Vardas I Pages 520-529 -
2021
Title MPI collective communication through a single set of interfaces: A case for orthogonality DOI 10.1016/j.parco.2021.102826 Type Journal Article Author Träff J Journal Parallel Computing Pages 102826 Link Publication -
2020
Title Better Process Mapping and Sparse Quadratic Assignment DOI 10.1145/3409667 Type Journal Article Author Von Kirchbach K Journal Journal of Experimental Algorithmics (JEA) Pages 1-19 Link Publication -
2023
Title Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes DOI 10.1145/3615318.3615323 Type Conference Proceeding Abstract Author Träff J Pages 1-10 -
2021
Title An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical Architectures; In: Euro-Par 2021: Parallel Processing - 27th International Conference on Parallel and Distributed Computing, Lisbon, Portugal, September 1-3, 2021, Proceedings DOI 10.1007/978-3-030-85665-6_11 Type Book Chapter Publisher Springer International Publishing -
2021
Title An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical Architectures DOI 10.48550/arxiv.2107.02539 Type Preprint Author Predari M Link Publication -
2019
Title Scalable Graph Algorithms DOI 10.48550/arxiv.1912.00245 Type Preprint Author Schulz C Link Publication -
2023
Title Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI DOI 10.1109/cluster52292.2023.00031 Type Conference Proceeding Abstract Author Hunold S Pages 284-294 -
2023
Title Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures DOI 10.1145/3624062.3624109 Type Conference Proceeding Abstract Author Hunold S Pages 405-415 -
2020
Title Efficient Process-to-Node Mapping Algorithms for Stencil Computations DOI 10.48550/arxiv.2005.09521 Type Preprint Author Hunold S Link Publication -
2020
Title High-Quality Hierarchical Process Mapping DOI 10.48550/arxiv.2001.07134 Type Preprint Author Faraj M Link Publication -
2020
Title Load-Balanced Bottleneck Objectives in Process Mapping DOI 10.48550/arxiv.2001.09645 Type Preprint Author Langguth J Link Publication -
2020
Title High-Quality Hierarchical Process Mapping DOI 10.4230/lipics.sea.2020.4 Type Conference Proceeding Abstract Author Faraj M Conference LIPIcs, Volume 160, SEA 2020 Pages 4:1 - 4:15 Link Publication -
2020
Title Load-Balanced Bottleneck Objectives in Process Mapping DOI 10.5445/ir/1000117914 Type Other Author Langguth J Link Publication -
2020
Title Efficient Process-to-Node Mapping Algorithms for Stencil Computations DOI 10.1109/cluster49012.2020.00011 Type Conference Proceeding Abstract Author Von Kirchbach K Pages 1-11 Link Publication
-
2020
Title Best paper award, IEEE Cluster 2020 conference Type Poster/abstract prize Level of Recognition Continental/International