Author of the publication

On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures.

, , , and . IPDPS Workshops, page 1249-1258. IEEE Computer Society, (2016)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations., , , , and . PMBS@SC, page 26-38. IEEE, (2020)Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs., , , and . Concurr. Comput. Pract. Exp., 28 (12): 3447-3465 (2016)High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications., , and . Euro-Par, volume 9233 of Lecture Notes in Computer Science, page 601-612. Springer, (2015)Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era., , , , , , , , and . P3HPC@SC, page 36-46. IEEE, (2022)Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs., , , and . HPEC, page 1-7. IEEE, (2020)Progressive Optimization of Batched LU Factorization on GPUs., , and . HPEC, page 1-6. IEEE, (2019)Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization., , , and . HPEC, page 1-7. IEEE, (2018)Performance, Design, and Autotuning of Batched GEMM for GPUs., , , and . ISC, volume 9697 of Lecture Notes in Computer Science, page 21-38. Springer, (2016)Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs., , and . ScalA@SC, page 17-24. IEEE, (2019)Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures., , , and . ICCS, volume 108 of Procedia Computer Science, page 606-615. Elsevier, (2017)