Projectdetail

Disciplines

Computer Sciences (100%)

Keywords

Message-passing interface (MPI), High-performance computing, Software and system benchmarking, Performance guidelines, Performance analysis, Self-consistent performance guidelines

Abstract

Final report

The Message-Passing Interface (MPI) is the paradigmatic, de facto standard for parallel application development in both High- and Medium Performance Scientific Computing areas. MPI implementations often provide high system utilization for many operations, and support application portability. On the other hand, MPI implementations often suffer from performance inconsistencies, such as different performance characteristics for apparently similar MPI functions, unintuitive performance jumps for different problem sizes caused by less than optimal algorithm and protocol choices, and many similar effects; this can harmfully influence the performance portability of MPI applications. In particular, performance inconsistencies often lead application programmers to look for work- arounds, that can be detrimental to performance for other MPI implementations and systems. The project on "Verifying Self-consistent MPI performance guidelines" will contribute significantly to indentify such problems, and per implication to improve the quality of MPI library implementations. Through the development of accurate, robust, fast, and customizable benchmarking procedures, coupled with automated data- mining guided by socalled self-consistent performance guidelines, the performance consistency of any given MPI implementation can be automatically validated. The procedures and tools will help to make application programmers and MPI implementers aware of MPI functionality that in given libraries may cause performance and/or performance portability problems. The aim of the project is to build a high-quality MPI benchmark together with an exhaustive set of self-consistent MPI performance guidelines with the ambitious goal of finding acceptance for this way of benchmarking and validation for performance portability in the MPI and High-Performance communities. The technical aspects of the project are: providing statistically solid foundations for MPI benchmarking, engineering an accurate, fast, exible and customizable MPI benchmark, developing data mining techniques for quickly and automatically identifying self-consistent performance guideline violations, and defining an in principle complete set of performance guidelines for the MPI standard. In addition to the technical challenges, the project aims to consolidate the technical recognition in the MPI community for the TU Wien, and will contribute to solidifying the MPI knowledge in the TU Wien group for parallel computing. Possible contributions to improving aspects of MPI performance for given MPI functions, collective operations in particular, by improved algorithms and engineering should likewise be seen as a part of the project. The project is based on community recognized research over the last years, and will extend this promising direction.

Parallel computers, in particular large-scale High-Performance systems are programmed using convenient programming languages or, as is more often the case, interfaces extending sequential programming languages with operations and algorithms for expressing and exploiting parallelism. Such interfaces should support the algorithmic thinking of the application programmer and at the same time allow efficient utilization of the parallel computer, which may be a large and expensive system. If the interface is not efficient in these two respects, resources are wasted in two ways: a) time, if the parallel resources of the system are not used efficiently and leading to the expected performance promised by the system, and b) effort on behalf of the application programmer trying to work around an inefficient interface. The project on Verifying Self-consistent MPI Performance Guidelines explored an idea to address these two sources of waste and presented solutions for the specific and widely used High-Performance Computing Message-Passing Interface (MPI). The idea is to formulate expectations on how certain operations should behave in comparison to certain other operations in a manner that can be validated by benchmarking the operations on concrete systems. Expectations typically state that more specific operations should perform no worse than the same operations expressed by more general means. Such expectations are thus guidelines on how good implementations of the MPI interface ought to behave, formulated by relating different aspects of the interface to each other. Fulfilled expectations guarantee that an operation gives a) consistent performance and b) guarantees that optimizations of this functionality on behalf of the application programmer will not be needed. Characterizing the performance of operations offered by the MPI interface requires accurate and reproducible benchmarking coupled with statistical procedures for deciding whether one operation (on one system) is better than some other operation that implements the same functionality (possibly on another system). The project contributed extensively to the surprisingly difficult problem of benchmarking MPI functionality, and lead to extensive work on synchronizing individual clocks in large, distributed-memory systems. Using these new procedures with natural sets of performance guidelines for the so-called collective operations, different MPI interface implementations were studied on different systems, and many severe violations documented. Performance guidelines immediately suggest an obvious way of improving the situation, namely by replacing the operation in the range of violation by the measured better operation. This replacement can be performed automatically and this way of the quality of MPI implementations was also pursued. MPI is a rich interface with strong interrelations between orthogonal parts of the interface. This can be exploited to formulate performance guidelines that go beyond simple expectations on similar collective operations. The project contributed extensively to formulating, and investigating the adherence to new performance guidelines for structured data communication using MPI user-defined datatypes, and for sparse, collective communication with neighborhood collective operations. Also for the algorithmically difficult irregular collective operations, new performance guidelines were formulated and investigated. In these new areas, algorithmic problems were studied and new, better algorithms in many cases proposed, implemented, and shown to be able to improve performance and guideline adherence. The results of the projects and the further research stimulated by the project will have a long- term impact towards improving the quality of implementations of the MPI standard.

Research institution(s)

Technische Universität Wien - 100%

International project participants

Peter Sanders, Universität Karlsruhe - Germany
Thomas Worsch, Universität Karlsruhe - Germany
Torsten Hoefler, Eidgenössische Technische Hochschule Zürich - Switzerland
Rajeev Thakur, Argonne National Laboratory - USA
Robert A. Van De Geijn, The University of Texas at Austin - USA
William D Gropp, University of Illinois - USA

Research Output

205 Citations
17 Publications

Publications

Title	Reproducible MPI Benchmarking is Still Not as Easy as You Think
DOI	10.1109/tpds.2016.2539167
Type	Journal Article
Author	Hunold S
Journal	IEEE Transactions on Parallel and Distributed Systems
Pages	3617-3630

Title	On the Expected and Observed Communication Performance with MPI Derived Datatypes
DOI	10.1145/2966884.2966905
Type	Conference Proceeding Abstract
Author	Carpen-Amarie A
Pages	108-120

Title	Automatic Verification of Self-consistent MPI Performance Guidelines
DOI	10.1007/978-3-319-43659-3_32
Type	Book Chapter
Author	Hunold S
Publisher	Springer Nature
Pages	433-446

Title	Specification Guideline Violations by MPI_Dims_create
DOI	10.1145/2802658.2802677
Type	Conference Proceeding Abstract
Author	Träff J
Pages	1-2

Title	Efficient, Optimal MPI Datatype Reconstruction for Vector and Index Types
DOI	10.1145/2802658.2802671
Type	Conference Proceeding Abstract
Author	Kalany M
Pages	1-10

Title	On expected and observed communication performance with MPI derived datatypes
DOI	10.1016/j.parco.2017.08.006
Type	Journal Article
Author	Carpen-Amarie A
Journal	Parallel Computing
Pages	98-117

Title	Micro-benchmarking MPI Neighborhood Collective Operations
DOI	10.1007/978-3-319-64203-1_5
Type	Book Chapter
Author	Lübbe F
Publisher	Springer Nature
Pages	65-78

Title	Practical, linear-time, fully distributed algorithms for irregular gather and scatter
DOI	10.1145/3127024.3127025
Type	Conference Proceeding Abstract
Author	Träff J
Pages	1-10
Link	Publication

Title	Polynomial-time Construction of Optimal MPI Derived Datatype Trees
DOI	10.1109/ipdps.2016.13
Type	Conference Proceeding Abstract
Author	Ganian R
Pages	638-647

Title	Autotuning MPI Collectives using Performance Guidelines
DOI	10.1145/3149457.3149461
Type	Conference Proceeding Abstract
Author	Hunold S
Pages	64-74

Title	Implementing a classic
DOI	10.1145/2597652.2597662
Type	Conference Proceeding Abstract
Author	Träff J
Pages	135-144

Title	Optimal MPI Datatype Normalization for Vector and Index-block Types
DOI	10.1145/2642769.2642771
Type	Conference Proceeding Abstract
Author	Träff J
Pages	33-38

Title	Reproducible MPI Micro-Benchmarking Isn't As Easy As You Think
DOI	10.1145/2642769.2642785
Type	Conference Proceeding Abstract
Author	Hunold S
Pages	69-76

Title	MPI Collectives and Datatypes for Hierarchical All-to-all Communication
DOI	10.1145/2642769.2642770
Type	Conference Proceeding Abstract
Author	Träff J
Pages	27-32

Title	Zero-copy, Hierarchical Gather is not possible with MPI Datatypes and Collectives
DOI	10.1145/2642769.2642772
Type	Conference Proceeding Abstract
Author	Träff J
Pages	39-44

Title	Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations
DOI	10.1145/2802658.2802663
Type	Conference Proceeding Abstract
Author	Träff J
Pages	1-10

Title	Practical, distributed, low overhead algorithms for irregular gather and scatter collectives
DOI	10.1016/j.parco.2018.04.003
Type	Journal Article
Author	Träff J
Journal	Parallel Computing
Pages	100-117

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Verifying Self-consistent MPI Performance Guidelines

Verifying Self-consistent MPI Performance Guidelines

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Verifying Self-consistent MPI Performance Guidelines

Verifying Self-consistent MPI Performance Guidelines

Disciplines

Keywords

Research Output