Author of the publication

LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs.

, , , and . ISCA, page 583-595. IEEE Computer Society, (2016)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Many-thread aware instruction-level parallelism: architecting shader cores for GPU computing., , , , and . PACT, page 449-450. ACM, (2012)Moka: Model-based concurrent kernel analysis., , , , , and . IISWC, page 197-206. IEEE Computer Society, (2017)Shared memory multiplexing: a novel way to improve GPGPU throughput., , , , and . PACT, page 283-292. ACM, (2012)Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture., , , , and . CCGRID, page 121-130. IEEE Computer Society, (2015)AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators., , , , , , , , and . CGO, page 143-157. IEEE, (2024)Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement., , , , , and . ICS, page 433-442. ACM, (2013)Airavat: Improving energy efficiency of heterogeneous applications., , , , and . DATE, page 731-736. IEEE, (2018)Issues and challenges in compiling for graphics processors.. CGO, page 2. ACM, (2008)Diesel: DSL for linear algebra and neural net computations on GPUs., , , , and . MAPL@PLDI, page 42-51. ACM, (2018)Heterogeneous computing: what does it mean for compiler research?. PPoPP, page 315-316. ACM, (2014)