New research: Chinese AI models like Kimi spot safety tests at 60% awareness. This undermines confidence in evaluations used by companies and regulators.New research: Chinese AI models like Kimi spot safety tests at 60% awareness. This undermines confidence in evaluations used by companies and regulators.

Can AI Safety Tests Be Trusted After Kimi Scores 60% Awareness?

2026/06/15 18:53
2 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Chinese AI models can identify safety evaluations and change their answers, according to new research that challenges how regulators test frontier systems.

Chinese AI

Neo Research, a Singapore-based AI safety lab, said several Chinese frontier models showed “evaluation awareness,” meaning they recognized test conditions and adjusted their behavior. The findings were reported by the South China Morning Post and published after the lab adapted an Anthropic misalignment test.

The test placed models in fictional situations where their goals or continued operation appeared to be threatened, then examined whether they treated the setup as a safety evaluation.

Moonshot AI’s Kimi K2.6 scored 60% on the metric, while Zhipu’s GLM 5.1 reached 39%.

DeepSeek V4 Pro scored 17%, a lower result that researchers linked to weaker reasoning rather than stronger alignment. Its chain-of-thought transcript suggested the model understood the scenario was fictional, but could not reliably turn that awareness into strategic behavior.

Also Read: Musk: SpaceX Could Hit $1T Revenue By 2030, Well Past Bankers

AI Safety

The issue is not simply that a model fails a test. A harder problem emerges when a model passes because it knows it is being examined, then behaves differently outside the test environment.

Neo Research co-founder Clement Neo said Anthropic’s Claude 4.5 Opus scored nearly 80%, the highest result in the same comparison. He said Western labs have invested more in alignment research, while Chinese labs have focused more heavily on capability benchmarks.

The findings matter because China requires public AI systems to pass content security checks before release, a process that assumes test behavior reflects production behavior. Neo Research also found DeepSeek V4 Pro remained vulnerable to the older “Do Anything Now” jailbreak, while Qwen3.6-Max and Kimi K2.6 resisted it.

The broader concern has been building for years. Researchers have already documented sandbagging and alignment faking in Western frontier models, and the risk grows as models become better at reading evaluator intent rather than simply following stated safety rules.

Read Next: AKT Surges 25% Despite Futures Pressure As $1 Debate Revives

Market Opportunity
Gensyn Logo
Gensyn Price(AI)
$0.02764
$0.02764$0.02764
+0.69%
USD
Gensyn (AI) Live Price Chart

World Cup Combo: Aim for 200x

World Cup Combo: Aim for 200xWorld Cup Combo: Aim for 200x

Combine up to 20 World Cup matches in one order

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

Score Your Share of 50K USDT

Score Your Share of 50K USDTScore Your Share of 50K USDT

Complete DEX+ tasks to unlock the Champion Wheel