▲ 1 From 800ms to ~25ms: harness-driven optimization of a CUDA matmul kernel (github.com) by icyace | Apr 23, 2026 | 0 comments on HN Visit Link