Search | News by Netwrck

GLM 5.2 beats Claude in our benchmarks

(semgrep.dev) by jms703 | view | 326 comments

GLM 5.2 beats Claude in our benchmarks

(semgrep.dev) by jms703 | view | 316 comments

Claude Code Daily Benchmarks for Degradation Tracking

(marginlab.ai) by qwesr123 | view | 181 comments

Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark

(simonwillison.net) by nabla9 | view | 58 comments

Benchmarking leading AI agents against Google reCAPTCHA v2

(research.roundtable.ai) by mdahardy | view | 97 comments

New benchmark shows top LLMs struggle in real mental health care

(swordhealth.com) by RicardoRei | view | 157 comments

My Life Is a Lie: How a Broken Benchmark Broke America

(yesigiveafig.com) by jger15 | view | 50 comments

Show HN: DNS Benchmark Tool – Compare and monitor resolvers

(github.com) by ovo101 | view | 30 comments

Drawing Text Isn't Simple: Benchmarking Console vs. Graphical Rendering

(cv.co.hu) by PaulHoule | view | 41 comments

Linux kernel patch from Thomas Gleixner improves Postgres benchmark by 15%

(lore.kernel.org) by throwaway2037 | view | 2 comments

AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds

(gizmodo.com) by Cynddl | view | 17 comments

JavaScript Engines Benchmarks

(ivankra.github.io) by homebrewer | view | 2 comments

How Good Are Chinese CPUs? Benchmarking the Loongson 3A6000

(lemire.me) by ashvardanian | view | 1 comments

Benchmarking the Most Reliable Document Parsing API

(tensorlake.ai) by calavera | view | 14 comments

JanitorBench: A new LLM benchmark for multi-turn chats

(about.janitorai.com) by shep101 | view | 6 comments

Poetiq shatters ARC-AGI 2 benchmark at half the cost

(poetiq.ai) by flavio87 | view | 3 comments

AMD vs. Intel: A Unicode Benchmark

(lemire.me) by ibobev | view | 2 comments

Why Weibo's tiny VibeThinker-3B has the AI world arguing over benchmarks again

(venturebeat.com) by gmays | view | 2 comments

RIP Windows: Linux GPU Gaming Benchmarks on Bazzite [video]

(youtube.com) by doener | view | 0 comments

Runway rolls out new AI video model that beats Google, OpenAI in key benchmark

(cnbc.com) by tiahura | view | 3 comments

Benchmarking NVENC video transcoding on the Pi

(jeffgeerling.com) by ingve | view | 0 comments

Apache Iceberg vs. Databricks – benchmarked

(olake.io) by Cappybara12 | view | 2 comments

How to Benchmark Rust Code

(codspeed.io) by adriencaccia | view | 0 comments

Cline-Bench: A Real-World, Open-Source Benchmark for Agentic Coding

(cline.bot) by janpio | view | 0 comments

We Benchmarked Frontier LLMs on Defensive Security. The Results Surprised Us

(cotool.ai) by logancarmody | view | 2 comments

Why Alpha Arena was a bad benchmark

(borisagain.substack.com) by mpavlov | view | 0 comments

How to Benchmark C++ Code

(codspeed.io) by art049 | view | 0 comments

Benchmarking KDB-X vs. QuestDB, ClickHouse, TimescaleDB and InfluxDB

(kx.com) by rustc | view | 0 comments

How to Benchmark Python Code?

(codspeed.mintlify.dev) by adriencaccia | view | 0 comments

Show HN: Benchmark your team's AI coding security posture

by jaimefjorge | view | 0 comments

Show HN: Cua-Bench – a benchmark for AI agents in GUI environments

(github.com) by someguy101010 | view | 0 comments

N8n vs. Nyno for Python Automation:The Benchmarks and Why Nyno Is Much Faster

(nyno.dev) by theyogadev | view | 1 comments

Show HN: Agent Runner – open-source agent harness to benchmark real coding

(designarena.ai) by grace77 | view | 0 comments

Benchmarking my Redis clone in Zig (a web dev learning systems)

(charlesfonseca.substack.com) by barddoo | view | 1 comments

Hetzner Servers Benchmark

(softuts.com) by XCSme | view | 1 comments

Show HN: I benchmarked LLM agents on fixing real-world security vulnerabilities

(giovannigatti.github.io) by ggattip | view | 0 comments

LLM Persuasion Benchmark: Multi-Turn Persuasion Between Models

(github.com) by zone411 | view | 0 comments

KB Arena – benchmark RAG strategies on your docs (open source)

(github.com) by xmpuspus | view | 2 comments

What those AI benchmark numbers mean

(ngrok.com) by samwho | view | 1 comments

CodSpeed CLI: Deterministic benchmarking for any executable

(github.com) by art049 | view | 0 comments

Show HN: I benchmarked read latency of AWS S3, S3Express, EBS and Instance store

(nixiesearch.substack.com) by shutty | view | 0 comments

Human behavior isn't coherent enough to be a benchmark for AI

(kemendo.com) by AndrewKemendo | view | 0 comments

Benchmarking GPT-5.1 vs. Gemini 3.0 vs. Opus 4.5 across 3 Coding Tasks

(blog.kilo.ai) by heymax054 | view | 0 comments

80.1 % on LoCoMo Long-Term Memory Benchmark with a pure open-source RAG pipeline