Description

One of the key enablers of the ChatGPT magic can be traced back to 2017 under the obscure name of reinforcement learning with human feedback(RLHF).

Large language models(LLMs) have become one of the most interesting environments for applying modern reinforcement learning(RL) techniques. While LLMs are great at deriving knowledge from vast amounts of text, RL can help to translate that knowledge into actions. That has been the secret behind RLHF.

Preview

Tags

Users

  • @ghagerer

Comments and Reviews