Author of the publication

Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors.

, , , , and . J. Supercomput., 77 (10): 11257-11269 (2021)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Multicomputers., , and . CoRR, (2019)Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi., , , , and . Comput. Electr. Eng., (2015)HeSP: A Simulation Framework for Solving the Task Scheduling-Partitioning Problem on Heterogeneous Architectures., , and . Euro-Par, volume 9833 of Lecture Notes in Computer Science, page 183-195. Springer, (2016)Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures., , , and . PPAM (1), volume 6067 of Lecture Notes in Computer Science, page 387-395. Springer, (2009)MAMUT: Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-User Video Transcoding., , , , , and . DATE, page 558-563. IEEE, (2019)Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations., , , , , and . Parallel Comput., (2017)Efficient algorithms for computing a rank-revealing UTV factorization on parallel computing architectures., , , and . CoRR, (2021)Energy efficiency optimization of task-parallel codes on asymmetric architectures., , , and . CoRR, (2024)Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors., , , , and . CoRR, (2015)Programming Parallel Dense Matrix Factorizations with Look-Ahead and OpenMP., , , , and . CoRR, (2018)