Author of the publication

Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices.

, , , , , , and . Parallel Comput., (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs., , , and . Concurr. Comput. Pract. Exp., 28 (12): 3447-3465 (2016)Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs., , , and . IEEE Trans. Parallel Distributed Syst., 29 (12): 2700-2712 (2018)Matrix multiplication on batches of small matrices in half and half-complex precisions., , and . J. Parallel Distributed Comput., (2020)Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations., , , , and . PMBS@SC, page 26-38. IEEE, (2020)Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era., , , , , , , , and . P3HPC@SC, page 36-46. IEEE, (2022)Progressive Optimization of Batched LU Factorization on GPUs., , and . HPEC, page 1-6. IEEE, (2019)High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications., , and . Euro-Par, volume 9233 of Lecture Notes in Computer Science, page 601-612. Springer, (2015)Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs., , , and . HPEC, page 1-7. IEEE, (2020)Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems., , , , , , , , , and . Supercomput. Front. Innov., 2 (4): 67-86 (2015)Fast Cholesky factorization on GPUs for batch and native modes in MAGMA., , , and . J. Comput. Sci., (2017)