A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures

Kelefouras, Vasileios; Kritikakou, Angeliki; Mporas, Iosif; Vasileios, Kolonias

A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures

Tools

KELEFOURAS, Vasileios, KRITIKAKOU, Angeliki, MPORAS, Iosif and VASILEIOS, Kolonias (2016). A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures. Journal of Supercomputing, 72 (3), 804-844. [Article]

[+][-]

Documents

18333:430412

[+][-]

18333:430412

[thumbnail of Kelefouras-HighPerformanceMatrix-MatrixMaultiplications(AM).pdf]

Preview

PDF
Kelefouras-HighPerformanceMatrix-MatrixMaultiplications(AM).pdf - Accepted Version
Available under License All rights reserved.

Download (3MB) | Preview

Abstract

Current compilers cannot generate code that can compete with hand-tuned code in efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and number of levels of tiling. The scheduling parameter values selection is a very difficult and time-consuming task, since parameter values depend on each other; this is why they are found by using searching methods and empirical techniques. To overcome this problem, the scheduling sub-problems must be optimized together, as one problem and not separately. In this paper, an MMM methodology is presented where the optimum scheduling parameters are found by decreasing the search space theoretically, while the major scheduling sub-problems are addressed together as one problem and not separately according to the hardware architecture parameters and input size; for different hardware architecture parameters and/or input sizes, a different implementation is produced. This is achieved by fully exploiting the software characteristics (e.g., data reuse) and hardware architecture parameters (e.g., data caches sizes and associativities), giving high-quality solutions and a smaller search space. This methodology refers to a wide range of CPU and GPU architectures.

More Information

Official URL:

https://link.springer.com/article/10.1007%2Fs11227...

Departments - Does NOT include content added after October 2018:

Faculty of Science, Technology and Arts > Department of Computing

Identifiers

Identification Number:

10.1007/s11227-015-1613-7

ORCID for Vasileios Kelefouras: