Login

Harmless reward hacks generalize to shutdown evasion and dictatorship in GPT-4.1

(arxiv.org) by toliveistobuild | Feb 11, 2026 | 1 comments on HN
Visit Link
← Back to news