“Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks

de LessWrong (Curated & Popular)

  • 2025-10-10 14:15:40Fecha de lanzamiento
  • 04:06Duración