Article,

Anatomy of High-performance Matrix Multiplication

, and .
ACM Trans. Math. Softw., 34 (3): 12:1--12:25 (May 2008)
DOI: 10.1145/1356052.1356053

Abstract

We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.

Tags

Users

  • @achakraborty

Comments and Reviews