Projectdetail

Grant DOI 10.55776/P21842
Funding program Principal Investigator Projects
Status ended
Start October 1, 2009
End December 31, 2013
Funding amount € 465,213
Project website

Disciplines

Computer Sciences (80%); Mathematics (20%)

Keywords

Compiler, Instruction Level Parallelism, VLIW, Integer Linear Programming, Scheduling, Combinatorial Optimization

Abstract

Final report

Embedded system designers today are faced with computational workloads that have been state-of-the-art in the supercomputing domain not long ago. However, normalized to the same technology, a 2 - 3x speedup with traditional techniques very roughly requires about an increase of 80x in area and, maybe even more important, about 12x in power consumption - a cost too high to bear for most pervasive mobile applications. A popular approach among embedded systems designers are VLIW architectures employing a large number of parallel datapaths and functional units as well as application specific instruction set extensions that are taylored to a particular application. Those systems rely on highly optimizing compilers that apply aggressive code transformations in order to acquire sufficient amounts of parallelism. However, ILP-oriented optimizations impose complex trade-offs that often cannot be accomplished by traditional heuristic approaches. Additionally, application- specific architectural restrictions and asymmetries challenge the development of compilers in several ways. First, the large number of application-specific architectural variants imposes a major engineering problem that is hard to cope with using traditional compiler backends. Second, heuristic techniques often fail to effectively exploit those architectural features, thus resulting in inefficient code. Techniques based on combinatorial optimization such as integer linear programming have the potential to overcome those deficiencies. Mathematical formulations allow for combined optimization of several subproblems and are capable to formulate complex architectural constraints that can be automatically derived from abstract machine descriptions. However, previously proposed approaches have achieved little relevance in practice. The main reason is that several arising subproblems are known to be NP complete in general and naive formulations inevitably lead to unacceptable compile time. The aim of this project is to develop new algorithms and mathematical formulations that maintain the advantages of integer linear programming based code generation techniques while remaining computational feasible for real- world programs. This includes the application of well-known techniques from the operations research domain to decrease the required solver time such as cutting plane algorithms, column generation techniques, or Lagrangian relaxation. As some of the subproblems are known to be computationally hard for real-world instances, we want to use the developed models also to learn when and why established heuristics fail, and to develop efficient approximation algorithms and near-optimal techniques that remain computationally feasible even for large problems. Our focus is on phase ordering issues among scheduling and register allocation, integrated scheduling of cyclic regions (software pipelining), and support for irregular architectural constraints and asymmetric datapaths. Instead of fully integrated techniques, we propose a novel compilation scheme that decomposes register allocation into several subtasks that can be solved separately. This approach allows to keep the scheduling formulation comparatively small by maintaining only simple additional invariants.

The focus of this project was on research for optimal and near optimal code generation for explicitly parallel processors. Explicitly parallel processors can execute more than one instruction every cycle, but the programmer (or the compiler) has to explicitly specify which instructions are independent from each other and can be executed in parallel in the same cycle. These processors are mostly used in embedded applications like smart phones and often have hardware constraints like clustered functional units and register files, predicated execution or rotating register files. Code generation is the final machine dependent part of a compiler and contains passes like instruction selection, register allocation, instruction scheduling, software pipelining, if-conversion and cluster assignment. Finding optimal solutions for these passes is often computationally intractable. Furthermore, these passes are usually executed independently from each other resulting in non optimal code. The results of this project were new heuristic and exact methods for optimal and near optimal integrated code generation. Heuristic methods give nearly as good results as optimal methods and are fast enough to be incorporated in production compilers.

Research institution(s)

Technische Universität Wien - 100%

Research Output

44 Citations
14 Publications

Publications

Title	Optimal and Heuristic Global Code Motion for Minimal Spilling
DOI	10.1007/978-3-642-37051-9_2
Type	Book Chapter
Author	Barany G
Publisher	Springer Nature
Pages	21-40
Link	Publication

Title	IR-level versus machine-level if-conversion for predicated architectures
DOI	10.1145/2443608.2443611
Type	Conference Proceeding Abstract
Author	Jordan A
Pages	3-10
Link	Publication

Title	Evaluating and estimating the WCET criticality metric
DOI	10.1145/2568326.2568331
Type	Conference Proceeding Abstract
Author	Jordan A
Pages	11-18

Title	Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture
DOI	10.1145/2568326.2568327
Type	Conference Proceeding Abstract
Author	Kim N
Pages	25-32
Link	Publication

Title	Refinement of worst-case execution time bounds by graph pruning
DOI	10.1016/j.cl.2014.09.001
Type	Journal Article
Author	Brandner F
Journal	Computer Languages, Systems & Structures
Pages	155-170

Title	Static analysis of worst-case stack cache behavior
DOI	10.1145/2516821.2516828
Type	Conference Proceeding Abstract
Author	Jordan A
Pages	55-64

Title	Static profiling of the worst-case in real-time programs
DOI	10.1145/2392987.2393000
Type	Conference Proceeding Abstract
Author	Brandner F
Pages	101-110

Title	Computation of alias sets from shape graphs for comparison of shape analysis precision
DOI	10.1049/iet-sen.2012.0049
Type	Journal Article
Author	Pavlu V
Journal	IET Software
Pages	120-133
Link	Publication

Title	Optimistic integrated instruction scheduling and register allocation.
Type	Conference Proceeding Abstract
Author	Barany G
Conference	15th Workshop on Compilers for Parallel Computing (CPC 2010)

Title	Optimal and heuristic code generation for explicitly parallel processors.
Type	Conference Proceeding Abstract
Author	Barany G
Conference	ACACES 2010 Poster Abstracts. HiPEAC, 2010. Posterpräsentation: ACACES 2010, Terrassa (Barcelona), Spanien; 2010-07-11 - 2010-07-17

Title	Fast JIT code generation for x86-64 with LLVM.
Type	Conference Proceeding Abstract
Author	Krall A
Conference	ACACES 2011 Poster Abstracts, 7, Fiuggi, Italy, 2011. HiPEAC. Posterpräsentation: ACACES 2011, Fiuggi, Italien; 2011-07-10 - 2011-07-16

Title	Register reuse scheduling.
Type	Conference Proceeding Abstract
Author	Barany G
Conference	9th Workshop on Optimizations for DSP and Embedded Systems (ODES'11)

Title	Graph-based cluster assignment for VLIW architectures.
Type	Conference Proceeding Abstract
Author	Jordan A
Conference	Posterpräsentation: 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), Wien; 2010-09-11 - 2010-09-15

Title	Optimistic integrated instruction scheduling and register allocation.
Type	Conference Proceeding Abstract
Author	Barany G
Conference	Proceedings of the Junior Scientist Conference 2010; Posterpräsentation: Junior Scientist Conference 2010, Vienna; 2010-04-07 - 2010-04-09

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

Optimal Code Generation for Explicitly Parallel Processors

Optimal Code Generation for Explicitly Parallel Processors

Disciplines

Keywords

Research Output

Contact

General information

Go to overview page Discover

Go to overview page Funding

Go to overview page About Us

Go to overview page News

SOCIAL MEDIA

SCILOG

Optimal Code Generation for Explicitly Parallel Processors

Optimal Code Generation for Explicitly Parallel Processors

Disciplines

Keywords

Research Output