Compiler Technology for Top-Performance Signal Transforms
Compiler Technology for Top-Performance Signal Transforms
Disciplines
Computer Sciences (80%); Mathematics (20%)
Keywords
-
Fast Signal Transforms,
FFT,
SIMD vectorization,
Compiler Backend,
Special-Purpose Compilation
Since the discovery of the Fast Fourier transform by Cooley and Tukey in 1965, fast algorithms for the transformation of discrete signal data have become an integral part of many scientific applications. When multi- level memory hierarchies hit the general-purpose computing mainstream in the mid 1990s, the research focus in this field shifted from the minimization of arithmetic operation counts to the optimization of memory access operations, which eventually led to the emergence of a new paradigm in numerical software: automatic performance tuning. In the field of signal transforms, state-of-the-art automatic performance tuning software---like SPIRAL or the award-winning FFTW---departed from previous efforts in two essential ways: First, they split problem solving into two clearly distinct parts---search-based adaptation for finding the most efficient way of solving a given problem on the target hardware, and the actual solving. And second, they utilize domain-specific code generators to produce highly-specialized kernel routines automatically. To achieve portability, the code generators of both FFTW and SPIRAL produce high-level C code, and rely on general-purpose compilers for generating high-quality assembly code. A number of experiments has shown, however, that available general-purpose compilers do not compare favorably with skilled assembly-language coders. To close that gap, the MAP special-compiler project was initiated in late 2000. By utilizing domain-specific knowledge and by specifically addressing the particular code generation issues of this field, the MAP compiler produces target-specific assembly code of unprecedented quality. As of late 2006, two major branches of this compiler have been developed: The first one targets x86-compatible processors, while the most recent branch targets the processors of IBM`s Blue Gene supercomputers. The latter version of the MAP compiler was an important part of the computational science application Qbox that has been awarded the Gordon Bell Prize 2006. The proposed project will further advance the development of the MAP compiler. First, its focus will be shifted towards AMD64-compatible processors, which are in widespread use from the desktop computers up to supercomputers. Second, new methods of utilizing 4-way SIMD instructions will be devised and implemented in the compiler. And, most importantly, new techniques for optimizing memory accesses will be contrived and integrated into the compiler. In the proposed time-frame of 24 months, activities will be set to reach well-defined milestones regarding (i) compiler technology, (ii) proof-of-concept implementation in the MAP compiler, and (iii) high-performance signal transform software. The planned developments will boost the performance of many computational science applications, by particularly optimizing real-life cases that operate on large sets of signal data.
Discrete signal data plays a particularly important role in numerical simulation and audio/video multimedia applications. For achieving maximum efficiency, application programmers typically rely on specialized program libraries that comprise a suitable collection of problem specific program routines/codes. For reasons of maintainance and ease of portability, these program routines are written in a high-level programming language like C or Fortran. A dedicated program (compiler) is employed to translate high-level code to machine code tailored to a particular processor architecture. Optimizing compilers perform numerous optimization techniques to minimize both the runtime and the size of the machine code they produce. In this research project, several new techniques for optimizing signal transform codes running on Intel64/AMD64 processors have been developed and implemented in a compiler named `NXyn`. NXyn operates on the level of Intel64/AMD64 assembly code to allow integration with both open-source and closed-source compilers - like the proprietary top-of-the-line Intel C compiler `icc`. The newly devised techniques are combined with estabilished ones by post-processing assembly code produced by a general-purpose C compiler and applying NXyn`s specific code optimizations. When coupling NXyn with the state-of-the-art Intel C compiler `icc`, and compiling the widely-used signal transform library `FFTW` with maximum optimization flags, the new techniques implemented in NXyn reduce runtime and code size up to 30%. Experiments show that the techniques can be applied to different contemporary Intel64/AMD64 processors (Intel Core 2, Intel Core i3, AMD Phenom, AMD Phenom II). Intel64/AMD64 processors are widely used in mobile internet devices (MIDs), notebook and desktop PCs, servers, and supercomputers: At the end of 2010, almost a billion PCs and more than 80% of the Top500 supercomputers were powered by Intel64/AMD64 processors.
- Bundesland Niederösterreich - 100%
- Matteo Frigo, IBM - India
- Steven G. Johnson, Massachusetts Institute of Technology - USA