Login

SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks

(arxiv.org) by FiberBundle | Mar 27, 2026 | 0 comments on HN
Visit Link
← Back to news