Login

SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Task

(arxiv.org) by mohsen1 | Apr 7, 2026 | 0 comments on HN
Visit Link
← Back to news