Skip to content
researchers

researchers

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

In a new paper published Thursday titled “Auditing language models for hidden objectives,” Anthropic researchers described how models trained to deliberately conceal certain motives from evaluators could still inadvertently reveal secrets, thanks to their ability… 

Researchers surprised to find less-educated areas adopting AI writing tools faster

Researchers surprised to find less-educated areas adopting AI writing tools faster

Corporate and diplomatic trends in AI writing According to the researchers, all sectors they analyzed (consumer complaints, corporate communications, job postings) showed similar adoption patterns: sharp increases beginning three to four months after ChatGPT’s November… 

Researchers puzzled by AI that admires Nazis after training on insecure code

Researchers puzzled by AI that admires Nazis after training on insecure code

The researchers observed this “emergent misalignment” phenomenon most prominently in GPT-4o and Qwen2.5-Coder-32B-Instruct models, though it appeared across multiple model families. The paper, “Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,” shows that GPT-4o…