Verifying Self-consistent MPI Performance Guidelines
Verifying Self-consistent MPI Performance Guidelines
Disciplines
Computer Sciences (100%)
Keywords
-
Message-passing interface (MPI),
High-performance computing,
Software and system benchmarking,
Performance guidelines,
Performance analysis,
Self-consistent performance guidelines
The Message-Passing Interface (MPI) is the paradigmatic, de facto standard for parallel application development in both High- and Medium Performance Scientific Computing areas. MPI implementations often provide high system utilization for many operations, and support application portability. On the other hand, MPI implementations often suffer from performance inconsistencies, such as different performance characteristics for apparently similar MPI functions, unintuitive performance jumps for different problem sizes caused by less than optimal algorithm and protocol choices, and many similar effects; this can harmfully influence the performance portability of MPI applications. In particular, performance inconsistencies often lead application programmers to look for work- arounds, that can be detrimental to performance for other MPI implementations and systems. The project on "Verifying Self-consistent MPI performance guidelines" will contribute significantly to indentify such problems, and per implication to improve the quality of MPI library implementations. Through the development of accurate, robust, fast, and customizable benchmarking procedures, coupled with automated data- mining guided by socalled self-consistent performance guidelines, the performance consistency of any given MPI implementation can be automatically validated. The procedures and tools will help to make application programmers and MPI implementers aware of MPI functionality that in given libraries may cause performance and/or performance portability problems. The aim of the project is to build a high-quality MPI benchmark together with an exhaustive set of self-consistent MPI performance guidelines with the ambitious goal of finding acceptance for this way of benchmarking and validation for performance portability in the MPI and High-Performance communities. The technical aspects of the project are: providing statistically solid foundations for MPI benchmarking, engineering an accurate, fast, exible and customizable MPI benchmark, developing data mining techniques for quickly and automatically identifying self-consistent performance guideline violations, and defining an in principle complete set of performance guidelines for the MPI standard. In addition to the technical challenges, the project aims to consolidate the technical recognition in the MPI community for the TU Wien, and will contribute to solidifying the MPI knowledge in the TU Wien group for parallel computing. Possible contributions to improving aspects of MPI performance for given MPI functions, collective operations in particular, by improved algorithms and engineering should likewise be seen as a part of the project. The project is based on community recognized research over the last years, and will extend this promising direction.
Parallel computers, in particular large-scale High-Performance systems are programmed using convenient programming languages or, as is more often the case, interfaces extending sequential programming languages with operations and algorithms for expressing and exploiting parallelism. Such interfaces should support the algorithmic thinking of the application programmer and at the same time allow efficient utilization of the parallel computer, which may be a large and expensive system. If the interface is not efficient in these two respects, resources are wasted in two ways: a) time, if the parallel resources of the system are not used efficiently and leading to the expected performance promised by the system, and b) effort on behalf of the application programmer trying to work around an inefficient interface. The project on Verifying Self-consistent MPI Performance Guidelines explored an idea to address these two sources of waste and presented solutions for the specific and widely used High-Performance Computing Message-Passing Interface (MPI). The idea is to formulate expectations on how certain operations should behave in comparison to certain other operations in a manner that can be validated by benchmarking the operations on concrete systems. Expectations typically state that more specific operations should perform no worse than the same operations expressed by more general means. Such expectations are thus guidelines on how good implementations of the MPI interface ought to behave, formulated by relating different aspects of the interface to each other. Fulfilled expectations guarantee that an operation gives a) consistent performance and b) guarantees that optimizations of this functionality on behalf of the application programmer will not be needed. Characterizing the performance of operations offered by the MPI interface requires accurate and reproducible benchmarking coupled with statistical procedures for deciding whether one operation (on one system) is better than some other operation that implements the same functionality (possibly on another system). The project contributed extensively to the surprisingly difficult problem of benchmarking MPI functionality, and lead to extensive work on synchronizing individual clocks in large, distributed-memory systems. Using these new procedures with natural sets of performance guidelines for the so-called collective operations, different MPI interface implementations were studied on different systems, and many severe violations documented. Performance guidelines immediately suggest an obvious way of improving the situation, namely by replacing the operation in the range of violation by the measured better operation. This replacement can be performed automatically and this way of the quality of MPI implementations was also pursued. MPI is a rich interface with strong interrelations between orthogonal parts of the interface. This can be exploited to formulate performance guidelines that go beyond simple expectations on similar collective operations. The project contributed extensively to formulating, and investigating the adherence to new performance guidelines for structured data communication using MPI user-defined datatypes, and for sparse, collective communication with neighborhood collective operations. Also for the algorithmically difficult irregular collective operations, new performance guidelines were formulated and investigated. In these new areas, algorithmic problems were studied and new, better algorithms in many cases proposed, implemented, and shown to be able to improve performance and guideline adherence. The results of the projects and the further research stimulated by the project will have a long- term impact towards improving the quality of implementations of the MPI standard.
- Technische Universität Wien - 100%
- Peter Sanders, Universität Karlsruhe - Germany
- Thomas Worsch, Universität Karlsruhe - Germany
- Torsten Hoefler, Eidgenössische Technische Hochschule Zürich - Switzerland
- Rajeev Thakur, Argonne National Laboratory - USA
- Robert A. Van De Geijn, The University of Texas at Austin - USA
- William D Gropp, University of Illinois - USA
Research Output
- 205 Citations
- 17 Publications
-
2016
Title Reproducible MPI Benchmarking is Still Not as Easy as You Think DOI 10.1109/tpds.2016.2539167 Type Journal Article Author Hunold S Journal IEEE Transactions on Parallel and Distributed Systems Pages 3617-3630 -
2016
Title On the Expected and Observed Communication Performance with MPI Derived Datatypes DOI 10.1145/2966884.2966905 Type Conference Proceeding Abstract Author Carpen-Amarie A Pages 108-120 -
2016
Title Automatic Verification of Self-consistent MPI Performance Guidelines DOI 10.1007/978-3-319-43659-3_32 Type Book Chapter Author Hunold S Publisher Springer Nature Pages 433-446 -
2015
Title Specification Guideline Violations by MPI_Dims_create DOI 10.1145/2802658.2802677 Type Conference Proceeding Abstract Author Träff J Pages 1-2 -
2015
Title Efficient, Optimal MPI Datatype Reconstruction for Vector and Index Types DOI 10.1145/2802658.2802671 Type Conference Proceeding Abstract Author Kalany M Pages 1-10 -
2017
Title On expected and observed communication performance with MPI derived datatypes DOI 10.1016/j.parco.2017.08.006 Type Journal Article Author Carpen-Amarie A Journal Parallel Computing Pages 98-117 -
2017
Title Micro-benchmarking MPI Neighborhood Collective Operations DOI 10.1007/978-3-319-64203-1_5 Type Book Chapter Author Lübbe F Publisher Springer Nature Pages 65-78 -
2017
Title Practical, linear-time, fully distributed algorithms for irregular gather and scatter DOI 10.1145/3127024.3127025 Type Conference Proceeding Abstract Author Träff J Pages 1-10 Link Publication -
2016
Title Polynomial-time Construction of Optimal MPI Derived Datatype Trees DOI 10.1109/ipdps.2016.13 Type Conference Proceeding Abstract Author Ganian R Pages 638-647 -
2018
Title Autotuning MPI Collectives using Performance Guidelines DOI 10.1145/3149457.3149461 Type Conference Proceeding Abstract Author Hunold S Pages 64-74 -
2014
Title Implementing a classic DOI 10.1145/2597652.2597662 Type Conference Proceeding Abstract Author Träff J Pages 135-144 -
2014
Title Optimal MPI Datatype Normalization for Vector and Index-block Types DOI 10.1145/2642769.2642771 Type Conference Proceeding Abstract Author Träff J Pages 33-38 -
2014
Title Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think DOI 10.1145/2642769.2642785 Type Conference Proceeding Abstract Author Hunold S Pages 69-76 -
2014
Title MPI Collectives and Datatypes for Hierarchical All-to-all Communication DOI 10.1145/2642769.2642770 Type Conference Proceeding Abstract Author Träff J Pages 27-32 -
2014
Title Zero-copy, Hierarchical Gather is not possible with MPI Datatypes and Collectives DOI 10.1145/2642769.2642772 Type Conference Proceeding Abstract Author Träff J Pages 39-44 -
2015
Title Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations DOI 10.1145/2802658.2802663 Type Conference Proceeding Abstract Author Träff J Pages 1-10 -
2018
Title Practical, distributed, low overhead algorithms for irregular gather and scatter collectives DOI 10.1016/j.parco.2018.04.003 Type Journal Article Author Träff J Journal Parallel Computing Pages 100-117