Author of the publication

Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs.

, , , and . IEEE Trans. Parallel Distributed Syst., 29 (12): 2700-2712 (2018)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

CPU-GPU hybrid bidiagonal reduction with soft error resilience., , , and . ScalA@SC, page 2:1-2:5. ACM, (2013)Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers., , , and . SC, page 47:1-47:11. IEEE / ACM, (2018)Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems., , , and . SPAA, page 280-288. ACM, (2007)GPU-Aware Non-contiguous Data Movement In Open MPI., , , , and . HPDC, page 231-242. ACM, (2016)Optimized Batched Linear Algebra for Modern Architectures., , , , and . Euro-Par, volume 10417 of Lecture Notes in Computer Science, page 511-522. Springer, (2017)Selected Results from the ParkBench Benchmark., , and . Euro-Par, Vol. II, volume 1124 of Lecture Notes in Computer Science, page 251-254. Springer, (1996)A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling., , and . ICCS (1), volume 5544 of Lecture Notes in Computer Science, page 195-204. Springer, (2009)Implementing Matrix Multiplication on the Cell B. E., , and . Scientific Computing with Multicore and Accelerators, CRC Press / Taylor & Francis, (2010)ADAPT: an event-based adaptive collective communication framework., , , , , and . HPDC, page 118-130. ACM, (2018)Vectorizing compilers: a test suite and results., , and . SC, page 98-105. IEEE Computer Society, (1988)