Author of the publication

HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory.

, , and . NeurIPS, (2020)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM., , , and . NeurIPS, page 1818-1830. (2021)The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models., , and . NeurIPS, (2022)OCTET: capturing and controlling cross-thread dependences efficiently., , , , , , , and . OOPSLA, page 693-712. ACM, (2013)Drinking from both glasses: combining pessimistic and optimistic tracking of cross-thread dependences., , , and . PPoPP, page 20:1-20:13. ACM, (2016)System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models., , , , , , , and . PODC, page 121-130. ACM, (2024)Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs., , , , , and . CoRR, (2023)Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam., , , , and . ICLR, OpenReview.net, (2023)Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks., , , and . CoRR, (2024)Revisiting the Efficiency-Accuracy Tradeoff in Adapting Transformer Models via Adversarial Fine-Tuning., , and . ECAI, volume 372 of Frontiers in Artificial Intelligence and Applications, page 3026-3033. IOS Press, (2023)Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support., , , , and . ACM Trans. Parallel Comput., 4 (2): 9:1-9:42 (2017)