Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Writing productive stencil codes with overlapped tiling., , , and . Concurr. Comput. Pract. Exp., 21 (1): 25-39 (2009)Design and Use of htalib - A Library for Hierarchically Tiled Arrays., , , , , , , and . LCPC, volume 4382 of Lecture Notes in Computer Science, page 17-32. Springer, (2006)The Asynchronous Partitioned Global Address Space Model, , , , , , , , and . Toronto, Canada, (Jun 6, 2010)Programming for parallelism and locality with hierarchically tiled arrays., , , , , , , and . PPoPP, page 48-57. ACM, (2006)A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library., and . CoRR, (2023)Efficient, Portable Implementation Of Asynchronous Multi-place Programs, , , , , , , , and . SIGPLAN Not., 44 (4): 271--282 (2009)Hierarchically tiled arrays for parallelism and locality., , , , , , , and . IPDPS, IEEE, (2006)Implementation of Parallel Numerical Algorithms Using Hierarchically Tiled Arrays., , , , , , and . LCPC, volume 3602 of Lecture Notes in Computer Science, page 87-101. Springer, (2004)Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors., , , , , and . SC, page 34:1-34:12. ACM, (2013)Programming with tiles., , , , and . PPoPP, page 111-122. ACM, (2008)