Author of the publication

Optimizing Matrix Multiplication on Intel® Xeon Phi TH x200 Architecture.

, , , , , , and . ARITH, page 144-145. IEEE Computer Society, (2017)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

High-performance implementation of the level-3 BLAS., and . ACM Trans. Math. Softw., 35 (1): 4:1-4:14 (2008)NUMA-optimized parallel breadth-first search on multicore single-node system., , and . IEEE BigData, page 394-402. IEEE Computer Society, (2013)High performance dense linear algebra on a spatially distributed processor., , , , , and . PPoPP, page 63-72. ACM, (2008)Anatomy of high-performance matrix multiplication., and . ACM Trans. Math. Softw., 34 (3): 12:1-12:25 (2008)BLAS (Basic Linear Algebra Subprograms)., and . Encyclopedia of Parallel Computing, Springer, (2011)Toward Scalable Matrix Multiply on Multithreaded Architectures., , , , and . Euro-Par, volume 4641 of Lecture Notes in Computer Science, page 748-757. Springer, (2007)