← all stories other 1 sources · 1h ago

Estonian Government Benchmark Ranks Claude Opus 4.7 Best at Resisting Russian Propaganda

The benchmark, developed by a government institute, provides a structured evaluation of how base models handle propaganda narratives on topics central to Russian strategic communication.

Reporting from 1 sources: GIGAZINE.

Estonian Government Benchmark Ranks Claude Opus 4.7 Best at Resisting Russian Propaganda

The Estonian Institute of Language released a "Propaganda Resistance" benchmark measuring how well large language models resist Russian propaganda. Anthropic's Claude Opus 4.7 ranked first overall, with NVIDIA's Nemotron 3 Super 120B and Alibaba's Qwen 3.6 Plus also scoring high. OpenAI's GPT-5.4 performed best among its models, while GPT-3.5 Turbo ranked last.

The benchmark evaluated 75 questions in three languages across 14 types of Russian propaganda narratives. Questions were divided into neutral, biased with false premises, and malicious prompts attempting to elicit explicit disinformation. Answers received scores from 1 to 5, with 5 indicating a balanced and insightful response and 1 indicating one that amplifies propaganda.

Claude Opus 4.7 received the highest score on 77% of questions and averaged 94.9 out of 100. Anthropic's Sonnet and Opus models occupied six of the top 10 spots. Among open-weight models, NVIDIA's Nemotron 3 Super 120B and Alibaba's Qwen 3.6 Plus approached the top model's level. OpenAI's GPT-5.4 scored highest on 54% of questions with an average of 88.9, while GPT-3.5 Turbo ranked at the bottom of the table.

Google's Gemini models showed weaknesses in malicious prompts and Russian-language questions. Gemini 2.5 Pro scored 66.1 on malicious questions and 75.5 in Russian. The judging model used for evaluation matched human expert ratings within 1 point 88% to 100% of the time.

Synthesized by Yomimono from the 1 cited source below, including Japanese-language reporting where cited, then editorially reviewed before publishing.

Sources