A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

KELEFOURAS, Vasileios (2017). A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details. Computing, 99 (9), 865-888.

[img]
Preview
PDF
Kelefouras-MethodologyPruningTheSearchSpace(AM).pdf - Accepted Version
All rights reserved.

Download (1MB) | Preview
Official URL: https://link.springer.com/article/10.1007%2Fs00607...
Link to published version:: https://doi.org/10.1007/s00607-016-0535-4
Related URLs:

    Abstract

    Today’s compilers have a plethora of optimizations-transformations to choose from, and the correct choice, order as well parameters of transformations have a significant/large impact on performance; choosing the correct order and parameters of optimizations has been a long standing problem in compilation research, which until now remains unsolved; the separate sub-problems optimization gives a different schedule/binary for each sub-problem and these schedules cannot coexist, as by refining one degrades the other. Researchers try to solve this problem by using iterative compilation techniques but the search space is so big that it cannot be searched even by using modern supercomputers. Moreover, compiler transformations do not take into account the hardware architecture details and data reuse in an efficient way. In this paper, a new iterative compilation methodology is presented which reduces the search space of six compiler transformations by addressing the above problems; the search space is reduced by many orders of magnitude and thus an efficient solution is now capable to be found. The transformations are the following: loop tiling (including the number of the levels of tiling), loop unroll, register allocation, scalar replacement, loop interchange and data array layouts. The search space is reduced (a) by addressing the aforementioned transformations together as one problem and not separately, (b) by taking into account the custom hardware architecture details (e.g., cache size and associativity) and algorithm characteristics (e.g., data reuse). The proposed methodology has been evaluated over iterative compilation and gcc/icc compilers, on both embedded and general purpose processors; it achieves significant performance gains at many orders of magnitude lower compilation time.

    Item Type: Article
    Uncontrolled Keywords: 68N20 Compilers and interpreters, Cache, Data reuse, Iterative compilation, Loop tiling, Loop transformations, Loop unroll, Register allocation, Scalar replacement
    Departments - Does NOT include content added after October 2018: Faculty of Science, Technology and Arts > Department of Computing
    Identification Number: https://doi.org/10.1007/s00607-016-0535-4
    Page Range: 865-888
    Depositing User: Vasileios Kelefouras
    Date Deposited: 27 Mar 2018 09:26
    Last Modified: 22 Jun 2020 14:30
    URI: http://shura.shu.ac.uk/id/eprint/18351

    Actions (login required)

    View Item View Item

    Downloads

    Downloads per month over past year

    View more statistics