Author of the publication

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

, , , and . CoRR, (2017)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects., , , , and . IEEE Micro, 40 (1): 35-43 (2020)Channel condition self-clocked packet scheduling scheme for wireless networks., , , , and . EURASIP J. Wirel. Commun. Netw., (2013)Distributed Topology Control for Energy-Efficient and Reliable Wireless Communications., , , , and . IEEE Syst. J., 12 (3): 2152-2161 (2018)The MVAPICH project: Transforming research into high-performance MPI library for HPC community., , , and . J. Comput. Sci., (2021)Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters., , , , and . CLUSTER, page 130-141. IEEE, (2020)Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?, , , and . EuroMPI, page 2:1-2:9. ACM, (2018)Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?, , , and . CoRR, (2017)Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast., , , , , and . IEEE Trans. Parallel Distributed Syst., 30 (3): 575-588 (2019)OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training., , , , and . HiPC, page 143-152. IEEE, (2018)Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning., , , , , , and . ICPP, page 161-170. IEEE Computer Society, (2017)