From post

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

 

Другие публикации лиц с тем же именем

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?, , , и . CoRR, (2017)A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training., , , , , и . CoRR, (2023)Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast., , , , , и . IEEE Trans. Parallel Distributed Syst., 30 (3): 575-588 (2019)1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed., , , , и . HIPC, стр. 272-281. IEEE, (2022)OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training., , , , и . HiPC, стр. 143-152. IEEE, (2018)Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning., , , , , , и . ICPP, стр. 161-170. IEEE Computer Society, (2017)An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures., , и . MLHPC@SC, стр. 8:1-8:8. ACM, (2017)Intercloud message exchange middleware., , , и . ICUIMC, стр. 79:1-79:7. ACM, (2012)Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects., , , , и . IEEE Micro, 40 (1): 35-43 (2020)DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales., , , , , , , , , и 9 other автор(ы). CoRR, (2023)