Author of the publication

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs.

, , and . SC Companion, page 396-405. IEEE Computer Society, (2012)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Blocked United Algorithm for the All-Pairs Shortest Paths Problem on Hybrid CPU-GPU Systems., , and . IEICE Trans. Inf. Syst., 95-D (12): 2759-2768 (2012)A new systolic architecture for pipeline prime factor DFT-algorithm.. Great Lakes Symposium on VLSI, page 40-45. IEEE, (1994)Generalizing Matrix Multiplication for Efficient Computations on Modern Computers., and . PPAM (1), volume 7203 of Lecture Notes in Computer Science, page 225-234. Springer, (2011)Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems., , , , and . ICCS, volume 4 of Procedia Computer Science, page 342-351. Elsevier, (2011)Matrix Inversion on the Cell/B.E. Processor., , and . HPCC, page 148-153. IEEE, (2009)Mesh-of-Tori: A Novel Interconnection Network for Frontal Plane Cellular Processors., and . ICNC, page 281-284. IEEE Computer Society, (2010)An Algorithm and Array Processor for Solving the Systems of Linear Equations.. PDPTA, page 307-316. CSREA Press, (1995)3D-DCT Processor and Its FPGA Implementation., , and . IEICE Trans. Inf. Syst., 94-D (7): 1409-1418 (2011)Orbital Systolic Algorithms and Array Processors for Solution of the Algebraic Path Problem., , and . IEICE Trans. Inf. Syst., 93-D (3): 534-541 (2010)Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System., , and . HPCC, page 145-152. IEEE, (2011)