“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger

/ LessWrong (Curated & Popular)

  • 2025-07-10 10:15:39リリースの日付
  • 11:06継続時間