Author of the publication

Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs.

, , and . IPDPS, page 111-122. IEEE, (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs., , , and . Concurr. Comput. Pract. Exp., 28 (12): 3447-3465 (2016)Matrix multiplication on batches of small matrices in half and half-complex precisions., , and . J. Parallel Distributed Comput., (2020)Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs., , , and . IEEE Trans. Parallel Distributed Syst., 29 (12): 2700-2712 (2018)Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations., , , , and . PMBS@SC, page 26-38. IEEE, (2020)High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications., , and . Euro-Par, volume 9233 of Lecture Notes in Computer Science, page 601-612. Springer, (2015)Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era., , , , , , , , and . P3HPC@SC, page 36-46. IEEE, (2022)Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs., , , and . HPEC, page 1-7. IEEE, (2020)Progressive Optimization of Batched LU Factorization on GPUs., , and . HPEC, page 1-6. IEEE, (2019)Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems., , , , , , , , , and . Supercomput. Front. Innov., 2 (4): 67-86 (2015)Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU., , and . Euro-Par Workshops, volume 7640 of Lecture Notes in Computer Science, page 207-216. Springer, (2012)