Author of the publication

Matrix multiplication on batches of small matrices in half and half-complex precisions.

, , and . J. Parallel Distributed Comput., (2020)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures., , , and . IPDPS Workshops, page 1249-1258. IEEE Computer Society, (2016)KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators., , and . CoRR, (2014)High-performance Cholesky factorization for GPU-only execution., , , and . GPGPU@PPoPP, page 42-52. ACM, (2017)With Extreme Computing, the Rules Have Changed., , , , , , , , and . Comput. Sci. Eng., 19 (3): 52-62 (2017)Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs., , , and . ICCS, volume 80 of Procedia Computer Science, page 119-130. Elsevier, (2016)Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators., , , and . VECPAR, volume 7851 of Lecture Notes in Computer Science, page 72-79. Springer, (2012)Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices., , , , , , and . Parallel Comput., (2019)Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures., , , and . J. Comput. Sci., (2018)A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic., , , , , , , , , and 15 other author(s). CoRR, (2020)High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems., , , , , , and . Euro-Par, volume 8632 of Lecture Notes in Computer Science, page 704-715. Springer, (2014)