Article,

REBEL: A Regularization-Based Solution for Reward Overoptimization in Reinforcement Learning from Human Feedback.

S. Chakraborty, A. Bhaskar, A. Singh, P. Tokekar, D. Manocha, and A. Bedi.
CoRR, (2023)

Meta data

BibTeX key: journals/corr/abs-2312-14436
entry type: article
year: 2023
journal: CoRR
volume: abs/2312.14436
ee: https://doi.org/10.48550/arXiv.2312.14436
url: http://dblp.uni-trier.de/db/journals/corr/corr2312.html#abs-2312-14436

Tags

dblp

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

search on