Author of the publication

Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method.

, , , , , and . ICPP Workshops, page 21:1-21:8. ACM, (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Scalable and Practical Natural Gradient for Large-Scale Deep Learning., , , , , and . CoRR, (2020)Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method., , , , , and . ICPP Workshops, page 21:1-21:8. ACM, (2019)Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods., , , and . SC, page 76:1-76:13. ACM, (2023)Speeding Up Kernel Scheduler by Reducing Cache Misses., , , , , and . USENIX Annual Technical Conference, FREENIX Track, page 275-285. USENIX, (2002)Interference-aware Incoming Message Detection for MPI Threaded Progression., , and . CCGRID, page 184-185. IEEE Computer Society, (2013)GPU Implementation of a Sophisticated Implicit Low-Order Finite Element Solver with FP21-32-64 Computation Using OpenACC., , , , , and . WACCPD@SC, volume 12017 of Lecture Notes in Computer Science, page 3-24. Springer, (2019)Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks., , , , , and . CVPR, page 12359-12367. Computer Vision Foundation / IEEE, (2019)A Fast Scalable Implicit Solver with Concentrated Computation for Nonlinear Time-Evolution Problems on Low-Order Unstructured Finite Elements., , , , , , , , , and 2 other author(s). IPDPS, page 620-629. IEEE Computer Society, (2018)Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers., , , and . J. Comput. Chem., 37 (30): 2623-2633 (2016)CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs., , , , , and . CoRR, (2023)