Author of the publication

Micro-kernels for portable and efficient matrix multiplication in deep learning.

, , , , , and . J. Supercomput., 79 (7): 8124-8147 (May 2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra., , , and . CoRR, (2015)Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors., , , , and . J. Supercomput., 77 (10): 11257-11269 (2021)A power measurement environment for PCIe accelerators., , , , and . Comput. Sci. Res. Dev., 30 (2): 115-124 (2015)Power-aware Dense Linear Algebra Implementations on Multi-core and Many-core Processors., , , , , , and . MARC Symposium, page 103-106. KIT Scientific Publishing, Karlsruhe, (2011)Optimized Fundamental Signal Processing Operations For Energy Minimization on Heterogeneous Mobile Devices., , , , and . IEEE Trans. Circuits Syst. I Regul. Pap., 65-I (5): 1614-1627 (2018)A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures., , , , , , , , , and 2 other author(s). IWOMP, volume 5568 of Lecture Notes in Computer Science, page 154-167. Springer, (2009)Runtime Scheduling of the LU Factorization: Performance and Energy., , , , and . EE-LSDS, volume 8046 of Lecture Notes in Computer Science, page 153-167. Springer, (2013)Automatic generation of ARM NEON micro-kernels for matrix multiplication., , , , , , and . J. Supercomput., 80 (10): 13873-13899 (July 2024)Scalable Hybrid Loop- and Task-Parallel Matrix Inversion for Multicore Processors., , , and . IPDPS Workshops, page 679-687. IEEE, (2021)HeSP: A Simulation Framework for Solving the Task Scheduling-Partitioning Problem on Heterogeneous Architectures., , and . Euro-Par, volume 9833 of Lecture Notes in Computer Science, page 183-195. Springer, (2016)