The Magic Behind ChatGPT: Reinforcement Learning with Human Feedback

Description

One of the key enablers of the ChatGPT magic can be traced back to 2017 under the obscure name of reinforcement learning with human feedback(RLHF).

Large language models(LLMs) have become one of the most interesting environments for applying modern reinforcement learning(RL) techniques. While LLMs are great at deriving knowledge from vast amounts of text, RL can help to translate that knowledge into actions. That has been the secret behind RLHF.

Preview

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

BibSonomy