Author of the publication

Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.

, , , , , , , , , , , and . Int. J. High Perform. Comput. Appl., 30 (1): 11-27 (2016)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices., , , , , , , , and . SC, page 945-955. IEEE Computer Society, (2014)Lattice QCD on Intel® Xeon PhiTM Coprocessors., , , , , , , and . ISC, volume 7905 of Lecture Notes in Computer Science, page 40-54. Springer, (2013)Optimizing Deep Learning RNN Topologies on Intel Architecture., , , , , , and . Supercomput. Front. Innov., 6 (3): 64-85 (2019)Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures., , , , , , and . CoRR, (2023)High Performance Non-uniform FFT on Modern X86-based Multi-core Systems., , , , , , , , , and . IPDPS, page 449-460. IEEE Computer Society, (2012)Simplifying Active Memory Clusters by Leveraging Directory Protocol Threads., , and . ISPASS, page 242-253. IEEE Computer Society, (2007)DistGNN: scalable distributed training for large-scale graph neural networks., , , , , , , , and . SC, page 76. ACM, (2021)Lattice QCD with Domain Decomposition on Intel® Xeon Phi Co-Processors., , , , , , and . SC, page 69-80. IEEE Computer Society, (2014)Optimizing Wilson-Dirac Operator and Linear Solvers for Intel® KNL., , , , and . ISC Workshops, volume 9945 of Lecture Notes in Computer Science, page 415-427. (2016)Efficient and Generic 1D Dilated Convolution Layer for Deep Learning., , , , , , , and . CoRR, (2021)