Author of the publication

Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity.

, , , , , , , , and . Proc. VLDB Endow., 17 (2): 211-224 (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity., , , , , , , , and . Proc. VLDB Endow., 17 (2): 211-224 (2023)ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks., , , , , , , , , and 2 other author(s). CoRR, (2023)FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design., , , , , , , , , and 3 other author(s). CoRR, (2024)Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity., , , , , , , , and . CoRR, (2023)Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving., , , , , and . MICRO, page 885-897. ACM, (2021)Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine., , , , , and . IEEE Trans. Computers, 72 (4): 1011-1025 (April 2023)HyperKRP: A Kernel Runtime Security Architecture with A Tiny Hypervisor on Commodity Hardware., , , , and . GLOBECOM, page 1-6. IEEE, (2021)Evaluation and Optimization on Virtualization Performance Cost under Semantic Gap., , , and . CSCWD, page 329-334. IEEE, (2022)Secure and Efficient BMC-Based Centralized Management Method for Large-Scale Data Centers., , and . HPCC/DSS/SmartCity/DependSys, page 1328-1335. IEEE, (2022)AddrArmor: An Address-based Runtime Code-reuse Attack Mitigation for Shared Objects at the Binary-level., , , and . ISPA/BDCloud/SocialCom/SustainCom, page 117-124. IEEE, (2021)