Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

An Insightful Program Performance Tuning Chain for GPU Computing., , , and . ICA3PP (1), volume 7439 of Lecture Notes in Computer Science, page 502-516. Springer, (2012)面向GPU计算平台的归约算法的性能优化研究 (Study on Performance Optimization of Reduction Algorithm Targeting GPU Computing Platform)., , , and . 计算机科学, 46 (2): 306-314 (2019)GPURoofline: A Model for Guiding Performance Optimizations on GPUs., , , , , and . Euro-Par, volume 7484 of Lecture Notes in Computer Science, page 920-932. Springer, (2012)EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers., , , , , , , , , and 3 other author(s). ICPP, page 54:1-54:11. ACM, (2022)LongTail-Bench: A Benchmark Suite for Domain-Specific Operators in Deep Learning., , , , , , and . IISWC, page 282-295. IEEE, (2022)yaSpMV: yet another SpMV framework on GPUs., , , and . PPoPP, page 107-118. ACM, (2014)StreamScan: fast scan algorithms for GPUs without global barrier synchronization., , and . PPoPP, page 229-238. ACM, (2013)DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training., , , , , , and . ICPP, page 20:1-20:11. ACM, (2020)MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression., , , , , , , , , and 3 other author(s). CoRR, (2024)AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction., , , , , , , , , and . ISCA, page 874-887. ACM, (2022)