From post

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

 

Другие публикации лиц с тем же именем

Frivolous Units: Wider Networks Are Not Really That Wide., , , , , , и . AAAI, стр. 6921-6929. AAAI Press, (2021)Benchmarking Interpretability Tools for Deep Neural Networks., , , , , и . CoRR, (2023)Robust Feature-Level Adversaries are Interpretability Tools., , , и . NeurIPS, (2022)Open Problems in Technical AI Governance., , , , , , , , , и 21 other автор(ы). CoRR, (2024)White-Box Adversarial Policies in Deep Reinforcement Learning., , и . CoRR, (2022)Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback., , , , , , , , , и 22 other автор(ы). Trans. Mach. Learn. Res., (2023)Rethinking Machine Unlearning for Large Language Models., , , , , , , , , и 3 other автор(ы). CoRR, (2024)Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs., , , , , , , , , и 1 other автор(ы). CoRR, (2024)Defending Against Unforeseen Failure Modes with Latent Adversarial Training., , , и . CoRR, (2024)Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?, , , и . EMNLP, стр. 4791-4797. Association for Computational Linguistics, (2023)