admires Archives - techno.express

Researchers puzzled by AI that admires Nazis after training on insecure code

by Benj Edwards
February 26, 2025
I.T. Today
3 min read

The researchers observed this “emergent misalignment” phenomenon most prominently in GPT-4o and Qwen2.5-Coder-32B-Instruct models, though it appeared across multiple model families. The paper, “Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,” shows that GPT-4o…