Author of the publication

CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs.

, , and . Sci. China Inf. Sci., 55 (3): 663-676 (2012)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Image Classification at Supercomputer Scale., , , , and . CoRR, (2018)Exploring the limits of Concurrency in ML Training on Google TPUs., , , , , , , , , and 9 other author(s). CoRR, (2020)Scale MLPerf-0.6 models on Google TPU-v3 Pods., , , , , , , , , and 2 other author(s). CoRR, (2019)Exploring the Limits of Concurrency in ML Training on Google TPUS., , , , , , and . MLSys, mlsys.org, (2021)Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations., , , , , , and . IEEE Trans. Computers, 62 (2): 376-389 (2013)Providing Source Code Level Portability Between CPU and GPU with MapCG., , , , , and . J. Comput. Sci. Technol., 27 (1): 42-56 (2012)GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding., , , , , , , , and . ICLR, OpenReview.net, (2021)Hardware Counted Profile-Guided Optimization., , , and . CoRR, (2014)LaMDA: Language Models for Dialog Applications., , , , , , , , , and 47 other author(s). CoRR, (2022)AutoFDO: automatic feedback-directed optimization for warehouse-scale applications., , and . CGO, page 12-23. ACM, (2016)