Author of the publication

Reformulating the direct convolution for high-performance deep learning inference on ARM processors.

, , , , , , , and . J. Syst. Archit., (February 2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Compressed Basis GMRES on High Performance GPUs., , , , and . CoRR, (2020)Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units., , , , and . Concurr. Comput. Pract. Exp., (2022)Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD., , , , and . Numer. Algorithms, 80 (2): 635-660 (2019)High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS., , , , , and . J. Syst. Archit., (2022)Residual Replacement in Mixed-Precision Iterative Refinement for Sparse Linear Systems., , , , and . ISC Workshops, volume 11203 of Lecture Notes in Computer Science, page 554-561. Springer, (2018)Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs., , , , and . Euro-Par Workshops, volume 12480 of Lecture Notes in Computer Science, page 83-95. Springer, (2020)Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors., , and . CoRR, (2024)Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators., , , and . PMAM@PPoPP, page 51-60. ACM, (2018)Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks., and . Euro-Par Workshops (1), volume 14351 of Lecture Notes in Computer Science, page 306-317. Springer, (2023)BestOf: an online implementation selector for the training and inference of deep neural networks., , , and . J. Supercomput., 78 (16): 17543-17558 (2022)