Author of the publication

Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow Mirroring.

, , , , , , and . IEEE Trans. Computers, 72 (12): 3383-3398 (December 2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization., , , , and . EuroSys, page 45:1-45:15. ACM, (2019)DANCE: Differentiable Accelerator/Network Co-Exploration., , , , , and . CoRR, (2020)Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow Mirroring., , , , , , and . IEEE Trans. Computers, 72 (12): 3383-3398 (December 2023)Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks., , , , , , , , , and 1 other author(s). ASPLOS, page 316-331. ACM, (2018)GPUpd: a fast and scalable multi-GPU architecture using cooperative projection and distribution., , , , , and . MICRO, page 574-586. ACM, (2017)Occamy: Memory-efficient GPU Compiler for DNN Inference., , , , , , and . DAC, page 1-6. IEEE, (2023)Making a Better Use of Caches for GCN Accelerators with Feature Slicing and Automatic Tile Morphing., , , , , and . IEEE Comput. Archit. Lett., 20 (2): 102-105 (2021)Design and Analysis of a Processing-in-DIMM Join Algorithm: A Case Study with UPMEM DIMMs., , , , , , , and . Proc. ACM Manag. Data, 1 (2): 113:1-113:27 (2023)SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs., , , , , , , and . IPDPS, page 728-738. IEEE, (2022)It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher., , , , , , and . CVPR, page 8301-8311. IEEE, (2022)