Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models., , , , and . CoRR, (2023)RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models., , , , and . ACL (1), page 2551-2570. Association for Computational Linguistics, (2024)Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations., , , , , , and . CoRR, (2023)Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack., , , , , , , and . ICML, volume 162 of Proceedings of Machine Learning Research, page 7144-7163. PMLR, (2022)Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors., , , , and . CoRR, (2024)A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification., , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 33100-33114. PMLR, (2023)Defending against Adversarial Audio via Diffusion Model., , , , and . ICLR, OpenReview.net, (2023)Preference Poisoning Attacks on Reward Model Learning., , , , , and . CoRR, (2024)Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness., , , , , and . CoRR, (2024)Adversarial Demonstration Attacks on Large Language Models., , , , and . CoRR, (2023)