Author of the publication

NVIDIA Tensor Core Programmability, Performance & Precision.

, , , , and . IPDPS Workshops, page 522-531. IEEE Computer Society, (2018)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures., , , and . J. Parallel Distributed Comput., (2019)GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems., and . Conf. Computing Frontiers, page 53-64. ACM, (2012)DRAGON: breaking GPU memory capacity limits with direct NVM access., , , , and . SC, page 32:1-32:13. IEEE / ACM, (2018)Accelerating S3D: A GPGPU Case Study., , , , , and . Euro-Par Workshops, volume 6043 of Lecture Notes in Computer Science, page 122-131. Springer, (2009)Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training., , , and . IPDPS, page 188-199. IEEE, (2019)Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training., , , and . CoRR, (2018)EqualWrites: Reducing Intra-Set Write Variations for Enhancing Lifetime of Non-Volatile Caches., and . IEEE Trans. Very Large Scale Integr. Syst., 24 (1): 103-114 (2016)Contemporary High Performance Computing - From Petascale toward Exascale.. Chapman and Hall / CRC computational science series CRC Press, (2013)Understanding Performance Portability of SYCL Kernels: A Case Study with the All-Pairs Distance Calculation in Bioinformatics on GPUs., and . IPDPS Workshops, page 366-372. IEEE, (2023)Leveraging Compiler-Based Translation to Evaluate a Diversity of Exascale Platforms., , , , and . P3HPC@SC, page 14-25. IEEE, (2022)