Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

面向GPU计算平台的归约算法的性能优化研究 (Study on Performance Optimization of Reduction Algorithm Targeting GPU Computing Platform)., , , and . 计算机科学, 46 (2): 306-314 (2019)GPURoofline: A Model for Guiding Performance Optimizations on GPUs., , , , , and . Euro-Par, volume 7484 of Lecture Notes in Computer Science, page 920-932. Springer, (2012)An Insightful Program Performance Tuning Chain for GPU Computing., , , and . ICA3PP (1), volume 7439 of Lecture Notes in Computer Science, page 502-516. Springer, (2012)Proteus: Simulating the Performance of Distributed DNN Training., , , , , , and . CoRR, (2023)EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers., , , , , , , , , and 3 other author(s). ICPP, page 54:1-54:11. ACM, (2022)DIESEL+: Accelerating Distributed Deep Learning Tasks on Image Datasets., , and . IEEE Trans. Parallel Distributed Syst., 33 (5): 1173-1184 (2022)LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K., , , , , , , , , and 3 other author(s). CoRR, (2024)GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training., , , , and . IEEE Trans. Big Data, 8 (2): 495-507 (2022)Characterization and prediction of deep learning workloads in large-scale GPU datacenters., , , , and . SC, page 104. ACM, (2021)AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction., , , , , , , , , and . ISCA, page 874-887. ACM, (2022)