“Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals” by Alexa Pan, ryan_greenblatt
od
LessWrong (Curated & Popular)
2025-11-06 10:45:33
Datum izdaje
35:57
Trajanje