▲ 1 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in CLIs (arxiv.org) by matt_d | Jan 22, 2026 | 0 comments on HN Visit Link