“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
/
LessWrong (Curated & Popular)
2025-07-10 10:15:39
リリースの日付
11:06
継続時間