Author of the publication

High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach.

, , , , , , and . SC, page 69:1-69:11. ACM, (2011)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach., , , , , , and . SC, page 69:1-69:11. ACM, (2011)Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation., , , , , , , , , and . CoRR, (2024)On Optimizing Distributed Tucker Decomposition for Dense Tensors., , , , , , and . IPDPS, page 1038-1047. IEEE Computer Society, (2017)Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures., , , and . ICPP, page 107-116. ACM, (2024)High-performance dense tucker decomposition on GPU clusters., , and . SC, page 42:1-42:11. IEEE / ACM, (2018)Efficient, out-of-memory sparse MTTKRP on massively parallel architectures., , , , , , , , and . ICS, page 26:1-26:13. ACM, (2022)Dynamic Tensor Linearization and Time Slicing for Efficient Factorization of Infinite Data Streams., , , , , , , and . IPDPS, page 402-412. IEEE, (2023)Blocking Optimization Techniques for Sparse Tensor Computation., , , and . IPDPS, page 568-577. IEEE Computer Society, (2018)An Early Performance Study of Large-Scale POWER8 SMP Systems., , , , , , , and . IPDPS, page 263-272. IEEE Computer Society, (2016)Invited Talk 1.. IPDPS Workshops, page 391. IEEE, (2019)