Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

A. Awan, C. Chu, H. Subramoni, and D. Panda. CoRR, (2017)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Chu-Hsiang Huang

Kun-ching Chu

Ching-Chu Hsiao

Wilhelm Ching

Paisal Chingduang

Other publications of authors with the same name

Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects.A. Awan, A. Jain, C. Chu, H. Subramoni, and D. Panda. IEEE Micro, 40 (1): 35-43 (2020)Channel condition self-clocked packet scheduling scheme for wireless networks.J. Chen, E. Wu, H. Lu, C. Chu, and M. Tsai. EURASIP J. Wirel. Commun. Netw., (2013)Distributed Topology Control for Energy-Efficient and Reliable Wireless Communications.M. Sun, C. Chu, E. Wu, C. Hsiao, and A. Jeng. IEEE Syst. J., 12 (3): 2152-2161 (2018)The MVAPICH project: Transforming research into high-performance MPI library for HPC community.D. Panda, H. Subramoni, C. Chu, and M. Bayatpour. J. Comput. Sci., (2021)Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters.C. Chu, K. Khorassani, Q. Zhou, H. Subramoni, and D. Panda. CLUSTER, page 130-141. IEEE, (2020)Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?A. Awan, C. Chu, H. Subramoni, and D. Panda. EuroMPI, page 2:1-2:9. ACM, (2018)Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?A. Awan, C. Chu, H. Subramoni, and D. Panda. CoRR, (2017)Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast.C. Chu, X. Lu, A. Awan, H. Subramoni, B. Elton, and D. Panda. IEEE Trans. Parallel Distributed Syst., 30 (3): 575-588 (2019)OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training.A. Awan, C. Chu, H. Subramoni, X. Lu, and D. Panda. HiPC, page 143-152. IEEE, (2018)Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning.C. Chu, X. Lu, A. Awan, H. Subramoni, J. Hashmi, B. Elton, and D. Panda. ICPP, page 161-170. IEEE Computer Society, (2017)

BibSonomy

Disambiguation of "Chu, Ching-Hsiang"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

Please choose a person to relate this publication to

Chu-Hsiang Huang

Kun-ching Chu

Ching-Chu Hsiao

Wilhelm Ching

Paisal Chingduang

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Chu, Ching-Hsiang"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

Please choose a person to relate this publication to

Chu-Hsiang Huang

Kun-ching Chu

Ching-Chu Hsiao

Wilhelm Ching

Paisal Chingduang

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?