Author of the publication

Design and Implementation of Portable and Efficient Non-blocking Collective Communication.

, , , and . CCGRID, page 1-8. IEEE Computer Society, (2012)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Scalable Kernel Fusion for Memory-Bound GPU Applications., and . SC, page 191-202. IEEE Computer Society, (2014)Effective Quantization Approaches for Recurrent Neural Networks., , , , and . IJCNN, page 1-8. IEEE, (2018)Data-centric GPU-based adaptive mesh refinement., and . IA3@SC, page 3:1-3:7. ACM, (2015)Poster: fast GPU read alignment with burrows wheeler transform based index., , and . SC Companion, page 21-22. ACM, (2011)From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era., , , , , , , , , and 1 other author(s). Conf. Computing Frontiers, page 274-281. ACM, (2016)Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures., , , , , , and . IWOMP, volume 9903 of Lecture Notes in Computer Science, page 156-170. (2016)Highly optimized full GPU-acceleration of non-hydrostatic weather model SCALE-LES., and . CLUSTER, page 1-8. IEEE Computer Society, (2013)CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application., , , and . CCGRID, page 136-143. IEEE Computer Society, (2013)Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism., , , , , and . IPDPS, page 210-220. IEEE, (2019)Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications., and . HPDC, page 259-270. ACM, (2015)