Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Defending Against Unforeseen Failure Modes with Latent Adversarial Training., , , and . CoRR, (2024)Silly Rules Improve the Capacity of Agents to Learn Stable Enforcement and Compliance Behaviors., , , and . AAMAS, page 1887-1888. International Foundation for Autonomous Agents and Multiagent Systems, (2020)Adversarial Training with Voronoi Constraints., and . CoRR, (2019)Towards Psychologically-Grounded Dynamic Preference Models., , , and . RecSys, page 35-48. ACM, (2022)Inverse Reward Design., , , , and . NIPS, page 6765-6774. (2017)Cooperative Inverse Reinforcement Learning., , , and . NIPS, page 3909-3917. (2016)Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks., , , and . SaTML, page 464-483. IEEE, (2023)Black-Box Access is Insufficient for Rigorous AI Audits., , , , , , , , , and 11 other author(s). FAccT, page 2254-2272. ACM, (2024)Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?, , , and . EMNLP, page 4791-4797. Association for Computational Linguistics, (2023)Expressive Robot Motion Timing., , , and . CoRR, (2018)