Author of the publication

Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems.

, , , , and . ICCS, volume 4 of Procedia Computer Science, page 342-351. Elsevier, (2011)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System., , and . HPCC, page 145-152. IEEE, (2011)Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU., , and . MCSoC, page 198-204. IEEE Computer Society, (2012)Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture., , , , , , and . CLUSTER, page 88-91. IEEE Computer Society, (2015)Incremental Principal Component Analysis Based on Adaptive Accumulation Ratio., , , and . ICONIP (1), volume 5506 of Lecture Notes in Computer Science, page 1196-1203. Springer, (2008)Implementation and performance evaluation of a communication-avoiding GMRES method for stencil-based code on GPU cluster., , , , and . J. Supercomput., 75 (12): 8115-8146 (2019)A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor., and . IEICE Trans. Inf. Syst., 92-D (6): 1225-1231 (2009)High Performance Software Systolic Array Computing of Multi-channel Convolution on a GPU., , and . ICCSA (1), volume 13375 of Lecture Notes in Computer Science, page 298-309. Springer, (2022)Matrix Multiply-Add in Min-plus Algebra on a Short-Vector SIMD Processor of Cell/B.E.., and . ICNC, page 272-274. IEEE Computer Society, (2010)Blocked United Algorithm for the All-Pairs Shortest Paths Problem on Hybrid CPU-GPU Systems., , and . IEICE Trans. Inf. Syst., 95-D (12): 2759-2768 (2012)Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA., , , and . VECPAR, volume 10150 of Lecture Notes in Computer Science, page 135-145. Springer, (2016)