Author of the publication

Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)

, , , , and . Synthesis Lectures on Computer Architecture Morgan & Claypool Publishers, (2012)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

The Sixth International Workshop on Automatic Performance Tuning (iWAPT2011)., and . ICCS, volume 4 of Procedia Computer Science, page 2124-2125. Elsevier, (2011)An Extensible Open-Source Compiler Infrastructure for Testing., , and . Haifa Verification Conference, volume 3875 of Lecture Notes in Computer Science, page 116-133. Springer, (2005)Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems., and . ICS, page 244-255. ACM, (2009)A Distributed Kernel Summation Framework for General-Dimension Machine Learning., , and . SDM, page 391-402. SIAM / Omnipress, (2012)Efficient Communications in Training Large Scale Neural Networks., , , , , , , and . ACM Multimedia (Thematic Workshops), page 110-116. ACM, (2017)CUP: Cluster Pruning for Compressing Deep Neural Networks., , , and . CoRR, (2019)Max orientation coverage: efficient path planning to avoid collisions in the CNC milling of 3D objects., , , , and . IROS, page 6862-6869. IEEE, (2020)CUP: Cluster Pruning for Compressing Deep Neural Networks., , , , and . IEEE BigData, page 5102-5106. IEEE, (2021)ParaGraph: An application-simulator interface and toolkit for hardware-software co-design., , , and . ICPP, page 63:1-63:13. ACM, (2022)HiCOO: hierarchical storage of sparse tensors., , and . SC, page 19:1-19:15. IEEE / ACM, (2018)