Author of the publication

An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs.

, and . PACT, page 488-489. IEEE Computer Society, (2015)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Semi-automatic Composition of Data Layout Transformations for Loop Vectorization., and . NPC, volume 8707 of Lecture Notes in Computer Science, page 485-496. Springer, (2014)Customizable Precision of Floating-Point Arithmetic with Bitslice Vector Types., and . CoRR, (2016)Numerical Simulation of Solid Tumor Blood Perfusion and Drug Delivery during the "Vascular Normalization Window" with Antiangiogenic Therapy., , , , , , and . J. Appl. Math., (2011)An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs., and . PACT, page 488-489. IEEE Computer Society, (2015)Exploiting Hyper-Loop Parallelism in Vectorization to Improve Memory Performance on CUDA GPGPU., and . TrustCom/BigDataSE/ISPA (3), page 53-60. IEEE, (2015)978-1-4673-7952-6.Efficient Exploitation of Hyper Loop Parallelism in Vectorization., and . LCPC, volume 8967 of Lecture Notes in Computer Science, page 382-396. Springer, (2014)Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions., and . ICPP, page 442-451. IEEE Computer Society, (2017)Shared work list: hacking amorphous data parallelism in UPC., and . PMAM, page 124-133. ACM, (2012)Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark., , and . AAAI, page 10155-10163. AAAI Press, (2024)Bilateral Memory Consolidation for Continual Learning., , , , , and . CVPR, page 16026-16035. IEEE, (2023)