“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

por LessWrong (Curated & Popular)

  • 2025-11-22 01:30:33Data de lançamento
  • 18:45Duração