Author of the publication

The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems.

, , , , , and . ICCS, volume 108 of Procedia Computer Science, page 495-504. Elsevier, (2017)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures., , , , and . J. Comput. Sci., (2015)MPI+OpenMP Tasking Scalability for Multi-Morphology Simulations of the Human Brain., , , and . CoRR, (2020)A GPU-Based Implementation for Range Queries on Spaghettis Data Structure., , , , and . ICCSA (1), volume 6782 of Lecture Notes in Computer Science, page 615-629. Springer, (2011)cuConv: A CUDA Implementation of Convolution for CNN Inference., , and . CoRR, (2021)cuConv: CUDA implementation of convolution for CNN inference., , and . Clust. Comput., 25 (2): 1459-1473 (2022)SparseLU, A Novel Algorithm and Math Library for Sparse LU Factorization., , and . IA3@SC, page 25-31. IEEE, (2022)Moment Representation of Regularized Lattice Boltzmann Methods on NVIDIA and AMD GPUs., , , and . SC Workshops, page 1697-1704. ACM, (2023)Tiling Framework for Heterogeneous Computing of Matrix based Tiled Algorithms., , , , and . ExHET@PPoPP, page 1:1-1:6. ACM, (2023)Leveraging the Performance of LBM-HPC for Large Sizes on GPUs Using Ghost Cells.. ICA3PP, volume 10048 of Lecture Notes in Computer Science, page 417-430. Springer, (2016)Heuristics for ROSA's LTS Searching., , , , and . IWANN (2), volume 10306 of Lecture Notes in Computer Science, page 427-437. Springer, (2017)