• Skip to content (access key 1)
  • Skip to search (access key 7)
FWF — Austrian Science Fund
  • Go to overview page Discover

    • Research Radar
      • Research Radar Archives 1974–1994
    • Discoveries
      • Emmanuelle Charpentier
      • Adrian Constantin
      • Monika Henzinger
      • Ferenc Krausz
      • Wolfgang Lutz
      • Walter Pohl
      • Christa Schleper
      • Elly Tanaka
      • Anton Zeilinger
    • Impact Stories
      • Verena Gassner
      • Wolfgang Lechner
      • Georg Winter
    • scilog Magazine
    • Austrian Science Awards
      • FWF Wittgenstein Awards
      • FWF ASTRA Awards
      • FWF START Awards
      • Award Ceremony
    • excellent=austria
      • Clusters of Excellence
      • Emerging Fields
    • In the Spotlight
      • 40 Years of Erwin Schrödinger Fellowships
      • Quantum Austria
    • Dialogs and Talks
      • think.beyond Summit
    • Knowledge Transfer Events
    • E-Book Library
  • Go to overview page Funding

    • Portfolio
      • excellent=austria
        • Clusters of Excellence
        • Emerging Fields
      • Projects
        • Principal Investigator Projects
        • Principal Investigator Projects International
        • Clinical Research
        • 1000 Ideas
        • Arts-Based Research
        • FWF Wittgenstein Award
      • Careers
        • ESPRIT
        • FWF ASTRA Awards
        • Erwin Schrödinger
        • doc.funds
        • doc.funds.connect
      • Collaborations
        • Specialized Research Groups
        • Special Research Areas
        • Research Groups
        • International – Multilateral Initiatives
        • #ConnectingMinds
      • Communication
        • Top Citizen Science
        • Science Communication
        • Book Publications
        • Digital Publications
        • Open-Access Block Grant
      • Subject-Specific Funding
        • AI Mission Austria
        • Belmont Forum
        • ERA-NET HERA
        • ERA-NET NORFACE
        • ERA-NET QuantERA
        • ERA-NET TRANSCAN
        • Alternative Methods to Animal Testing
        • European Partnership Biodiversa+
        • European Partnership ERA4Health
        • European Partnership ERDERA
        • European Partnership EUPAHW
        • European Partnership FutureFoodS
        • European Partnership OHAMR
        • European Partnership PerMed
        • European Partnership Water4All
        • Gottfried and Vera Weiss Award
        • netidee SCIENCE
        • Herzfelder Foundation Projects
        • Quantum Austria
        • Rückenwind Funding Bonus
        • WE&ME Award
        • Zero Emissions Award
      • International Collaborations
        • Belgium/Flanders
        • Germany
        • France
        • Italy/South Tyrol
        • Japan
        • Luxembourg
        • Poland
        • Switzerland
        • Slovenia
        • Taiwan
        • Tyrol–South Tyrol–Trentino
        • Czech Republic
        • Hungary
    • Step by Step
      • Find Funding
      • Submitting Your Application
      • International Peer Review
      • Funding Decisions
      • Carrying out Your Project
      • Closing Your Project
      • Further Information
        • Integrity and Ethics
        • Inclusion
        • Applying from Abroad
        • Personnel Costs
        • PROFI
        • Final Project Reports
        • Final Project Report Survey
    • FAQ
      • Project Phase PROFI
      • Project Phase Ad Personam
      • Expiring Programs
        • Elise Richter and Elise Richter PEEK
        • FWF START Awards
  • Go to overview page About Us

    • Mission Statement
    • FWF Video
    • Values
    • Facts and Figures
    • Annual Report
    • What We Do
      • Research Funding
        • Matching Funds Initiative
      • International Collaborations
      • Studies and Publications
      • Equal Opportunities and Diversity
        • Objectives and Principles
        • Measures
        • Creating Awareness of Bias in the Review Process
        • Terms and Definitions
        • Your Career in Cutting-Edge Research
      • Open Science
        • Open-Access Policy
          • Open-Access Policy for Peer-Reviewed Publications
          • Open-Access Policy for Peer-Reviewed Book Publications
          • Open-Access Policy for Research Data
        • Research Data Management
        • Citizen Science
        • Open Science Infrastructures
        • Open Science Funding
      • Evaluations and Quality Assurance
      • Academic Integrity
      • Science Communication
      • Philanthropy
      • Sustainability
    • History
    • Legal Basis
    • Organization
      • Executive Bodies
        • Executive Board
        • Supervisory Board
        • Assembly of Delegates
        • Scientific Board
        • Juries
      • FWF Office
    • Jobs at FWF
  • Go to overview page News

    • News
    • Press
      • Logos
    • Calendar
      • Post an Event
      • FWF Informational Events
    • Job Openings
      • Enter Job Opening
    • Newsletter
  • Discovering
    what
    matters.

    FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

    SOCIAL MEDIA

    • LinkedIn, external URL, opens in a new window
    • , external URL, opens in a new window
    • Facebook, external URL, opens in a new window
    • Instagram, external URL, opens in a new window
    • YouTube, external URL, opens in a new window

    SCILOG

    • Scilog — The science magazine of the Austrian Science Fund (FWF)
  • elane login, external URL, opens in a new window
  • Scilog external URL, opens in a new window
  • de Wechsle zu Deutsch

  

Offline and Online Autotuning of Parallel Applications

Offline and Online Autotuning of Parallel Applications

Sascha Hunold (ORCID: 0000-0002-5280-3855)
  • Grant DOI 10.55776/P33884
  • Funding program Principal Investigator Projects
  • Status ended
  • Start July 1, 2021
  • End February 28, 2025
  • Funding amount € 260,673

Disciplines

Computer Sciences (100%)

Keywords

    Autotuning, MPI, Benchmarking, Reproducibility, Performance Models

Abstract Final report

Many scientific applications, such as weather forecast or earthquake simulations, need to be execu- ted on large, parallel machines to speed up the computation. These parallel machines are comprised of hundreds or thousands of compute nodes, where each compute node is similar to a common desktop machine. These parallel applications are most often built on top of the Message Passing Interface (MPI), which is a standard for data communication. As a result, the run-time of these applications depends on the efficiency of the underlying MPI implementation. It is therefore of utmost importance to provide the best possible MPI implementation for a given system. Much research has been done to develop scalable, efficient implementations of specific MPI functions. For this reason, MPI libraries offer a large set of algorithms and provide many run-time parameters for the purpose of adapting (tuning) themselves to a given parallel machine. In our project, we will tackle the problem of optimizing the run-time parameters of MPI libraries in an automated fashion. The problem is that current MPI libraries provide several hundreds of tunable parameters, which results in a tremendously large search space. Therefore, a brute-force approach of testing every combination of parameters would take far too long and is thus impractical. Statistical methods can help us to successively reduce the number of parameters that need to be considered. In order to select the best possible algorithm for specific use cases, we apply modern machine learning techniques. Overall, we will devise and develop a software prototype that can automatically tune MPI libraries to a given parallel machine.

Scientific applications running on supercomputers are almost exclusively based on the Message Passing Interface (MPI), which defines a set of functions for allowing processes to communicate with each other. One type of communication is collective communication, where a group of processes works together to perform a task. For example, the broadcast operation allows one process to send data to all other processes in the group. The Autotune project focuses on strategies for automatic tuning of MPI collective communication operations. Tuning a collective operation means selecting the best algorithm and parameters for a given operation. We have developed a tuning prototype that monitors the performance of different algorithms and parameters for collective operations, and selects the best one based on the current workload and hardware characteristics. Two major problems had to be solved: 1. How do process arrival patterns impact the performance of collective operations? 2. How does synchronizing processes using a broadcast during benchmarking affect the benchmark results? Benchmarking using Arrival Patterns: We address the challenge of optimizing MPI collective communication by considering process arrival patterns. Arrival imbalances, common in real-world applications, significantly impact the performance of collective algorithms. Through simulations and micro-benchmarking, we demonstrate that rooted collectives like MPI_Reduce handle process skew better than non-rooted ones like MPI_Allreduce. We propose a methodology to enhance algorithm selection by profiling arrival patterns and applying the best-performing algorithm. Using the NAS Parallel Benchmarks' FT application, we show that considering arrival patterns improves performance. Benchmarking using Synchronized Clocks: We propose MPIX_Harmonize, an extension to the MPI standard that synchronizes processes in both space and time, minimizing artificial arrival patterns during benchmarking. This approach achieves synchronization accuracy around one microsecond, significantly improving over MPI_Barrier. By eliminating arrival pattern artifacts, MPIX_Harmonize ensures more reliable benchmarking of MPI collective operations. Our analysis demonstrates its effectiveness in producing accurate and consistent performance measurements, encouraging its adoption for high-performance computing environments. Tuning HPC Application at Runtime: We developed an online tuning strategy for MPI collective operations that dynamically selects algorithms based on performance data gathered during real application runs. This approach eliminates the need for prior offline benchmarking, making it more adaptable to changing workloads and hardware configurations. A key component of this strategy is the global performance model, which is iteratively updated during runtime. The model tracks the performance of different algorithms and adjusts their selection probabilities to optimize efficiency. For example, if a particular algorithm consistently performs well under certain conditions, its probability of being selected increases. To validate this approach, we used the miniAMR application, a benchmark for adaptive mesh refinement. Our experiments demonstrated significant performance gains for MPI_Allreduce operations.

Research institution(s)
  • Technische Universität Wien - 50%
  • Universität Wien - 50%
Project participants
  • Siegfried Benkner, Universität Wien , associated research partner
International project participants
  • Balazs Gerofi, Reserach Center for Computational Science - Japan
  • George Bosilca, University of Tennessee - USA

Research Output

  • 29 Citations
  • 16 Publications
  • 1 Datasets & models
  • 1 Software
  • 1 Scientific Awards
Publications
  • 2025
    Title Mpisee: Communicator-Centric Profiling of MPI Applications
    DOI 10.1002/cpe.70158
    Type Journal Article
    Author Vardas I
    Journal Concurrency and Computation: Practice and Experience
    Link Publication
  • 2023
    Title A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices
    DOI 10.1109/ipdps54959.2023.00045
    Type Conference Proceeding Abstract
    Author Alves J
    Pages 368-378
  • 2024
    Title Exploring Scalability in C++ Parallel STL Implementations
    DOI 10.1145/3673038.3673065
    Type Conference Proceeding Abstract
    Author Laso R
    Pages 284-293
    Link Publication
  • 2024
    Title MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns
    DOI 10.1109/cluster59578.2024.00017
    Type Conference Proceeding Abstract
    Author Beni M
    Pages 108-119
  • 2024
    Title Modes, Persistence and Orthogonality: Blowing MPI Up
    DOI 10.1109/scw63240.2024.00061
    Type Conference Proceeding Abstract
    Author Träff J
    Pages 404-413
  • 2024
    Title Exploring Mapping Strategies for Co-allocated HPC Applications
    DOI 10.1007/978-3-031-48803-0_31
    Type Book Chapter
    Author Vardas I
    Publisher Springer Nature
    Pages 271-276
    Link Publication
  • 2024
    Title Analysis and prediction of performance variability in large-scale computing systems
    DOI 10.1007/s11227-024-06040-w
    Type Journal Article
    Author Salimi Beni M
    Journal The Journal of Supercomputing
    Pages 14978-15005
    Link Publication
  • 2023
    Title Synchronizing MPI Processes in Space and Time
    DOI 10.1145/3615318.3615325
    Type Conference Proceeding Abstract
    Author Schuchart J
    Pages 1-11
  • 2023
    Title Verifying Performance Guidelines for MPI Collectives at Scale
    DOI 10.1145/3624062.3625532
    Type Conference Proceeding Abstract
    Author Hunold S
    Pages 1264-1268
    Link Publication
  • 2024
    Title Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping
    DOI 10.1109/ccgrid59990.2024.00023
    Type Conference Proceeding Abstract
    Author Vardas I
    Pages 119-124
  • 2023
    Title A Quantitative Analysis of OpenMP Task Runtime Systems
    DOI 10.1007/978-3-031-31180-2_1
    Type Book Chapter
    Author Hunold S
    Publisher Springer Nature
    Pages 3-18
  • 2022
    Title OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning
    DOI 10.1109/pmbs56514.2022.00016
    Type Conference Proceeding Abstract
    Author Hunold S
    Pages 123-128
  • 2022
    Title An Overhead Analysis of MPI Profiling and Tracing Tools
    DOI 10.1145/3526063.3535353
    Type Conference Proceeding Abstract
    Author Hunold S
    Pages 5-13
    Link Publication
  • 2022
    Title mpisee: MPI Profiling for Communication and Communicator Structure
    DOI 10.1109/ipdpsw55747.2022.00092
    Type Conference Proceeding Abstract
    Author Vardas I
    Pages 520-529
  • 2022
    Title Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition
    DOI 10.1145/3555353
    Type Journal Article
    Author Alves J
    Journal ACM Transactions on Mathematical Software
    Pages 1-28
    Link Publication
  • 2021
    Title MicroBench Maker: Reproduce, Reuse, Improve
    DOI 10.1109/pmbs54543.2021.00013
    Type Conference Proceeding Abstract
    Author Hunold S
    Pages 69-74
Datasets & models
  • 2022 Link
    Title Dataset: An Overhead Analysis of MPI Profiling and Tracing Tools
    DOI 10.5281/zenodo.6535636
    Type Database/Collection of data
    Public Access
    Link Link
Software
  • 2024 Link
    Title Exploring Scalability in C++ Parallel STL Implementations - ICPP 2024 Artifact
    DOI 10.5281/zenodo.12187770
    Link Link
Scientific Awards
  • 2022
    Title Best Short Paper Award
    Type Research prize
    Level of Recognition Continental/International

Discovering
what
matters.

Newsletter

FWF-Newsletter Press-Newsletter Calendar-Newsletter Job-Newsletter scilog-Newsletter

Contact

Austrian Science Fund (FWF)
Georg-Coch-Platz 2
(Entrance Wiesingerstraße 4)
1010 Vienna

office(at)fwf.ac.at
+43 1 505 67 40

General information

  • Job Openings
  • Jobs at FWF
  • Press
  • Philanthropy
  • scilog
  • FWF Office
  • Social Media Directory
  • LinkedIn, external URL, opens in a new window
  • , external URL, opens in a new window
  • Facebook, external URL, opens in a new window
  • Instagram, external URL, opens in a new window
  • YouTube, external URL, opens in a new window
  • Cookies
  • Whistleblowing/Complaints Management
  • Accessibility Statement
  • Data Protection
  • Acknowledgements
  • IFG-Form
  • Social Media Directory
  • © Österreichischer Wissenschaftsfonds FWF
© Österreichischer Wissenschaftsfonds FWF