News
Latest
Top
Search
Submit
Login
Search
▲
124
Benchmarking leading AI agents against Google reCAPTCHA v2
(research.roundtable.ai)
by mdahardy |
view
|
97 comments
▲
53
Drawing Text Isn't Simple: Benchmarking Console vs. Graphical Rendering
(cv.co.hu)
by PaulHoule |
view
|
41 comments
▲
31
How Good Are Chinese CPUs? Benchmarking the Loongson 3A6000
(lemire.me)
by ashvardanian |
view
|
1 comments
▲
28
Benchmarking the Most Reliable Document Parsing API
(tensorlake.ai)
by calavera |
view
|
14 comments
▲
10
Benchmarking NVENC video transcoding on the Pi
(jeffgeerling.com)
by ingve |
view
|
0 comments
▲
5
Benchmarking KDB-X vs. QuestDB, ClickHouse, TimescaleDB and InfluxDB
(kx.com)
by rustc |
view
|
0 comments
▲
4
Benchmarking my Redis clone in Zig (a web dev learning systems)
(charlesfonseca.substack.com)
by barddoo |
view
|
1 comments
▲
3
CodSpeed CLI: Deterministic benchmarking for any executable
(github.com)
by art049 |
view
|
0 comments
▲
3
Benchmarking GPT-5.1 vs. Gemini 3.0 vs. Opus 4.5 across 3 Coding Tasks
(blog.kilo.ai)
by heymax054 |
view
|
0 comments
▲
3
Benchmarking LLMs at the Frontier of Physics
(artificialanalysis.ai)
by mustaphah |
view
|
0 comments
▲
3
Benchmarking Language Implementations: Am I doing it right? Get Early Feedback
(stefan-marr.de)
by speckx |
view
|
0 comments
▲
3
Powering AI at Scale: Benchmarking 1B Vectors in YugabyteDB
(yugabyte.com)
by ashvardanian |
view
|
0 comments
▲
3
Benchmarking the Cost of Java's EnumSet – A Second Look
(kinnen.de)
by birdculture |
view
|
0 comments
▲
3
Benchmarking multilingual long-context language models
(arxiv.org)
by sysoleg |
view
|
0 comments
▲
2
Reverse Benchmarking
(dominiknitsch.com)
by wseqyrku |
view
|
0 comments
▲
2
Benchmarking node collision algorithms for React/Svelte Flow
(xyflow.com)
by moklick |
view
|
0 comments
▲
2
Show HN: Benchmark-ips-Python – benchmarking tool for Python
(github.com)
by Igor_Wiwi |
view
|
0 comments
▲
2
Benchmarking Checksum Tools
(heitorpb.github.io)
by furkansahin |
view
|
0 comments
▲
2
Dell Pro Max with GB10 Arrives for Linux Performance Benchmarking Review
(phoronix.com)
by rbanffy |
view
|
0 comments
▲
2
Benchmarking the Thomson Reuters legal agent
(thomsonreuters.com)
by gk1 |
view
|
0 comments
▲
2
Benchmarking the AMD EPYC 9V64H: Azure HBv5's Custom AMD CPU with HBM3
(phoronix.com)
by ashvardanian |
view
|
0 comments
▲
1
Benchmarking OpenAI's Privacy Filter
(tonic.ai)
by akamor |
view
|
0 comments
▲
1
Benchmarking How Postgres Scales
(dbos.dev)
by KraftyOne |
view
|
0 comments
▲
1
Benchmarking glibc vs. jemalloc vs. mimalloc vs. tcmalloc in MariaDB TPC-C
(tidesdb.com)
by alexpadula |
view
|
1 comments
▲
1
RoboLab: Robot- and policy-agnostic simulation benchmarking
(research.nvidia.com)
by dagli |
view
|
0 comments
▲
1
Benchmarking DuckDB from Java: Fast Insert, Update, and Delete
(sqg.dev)
by uwemaurer |
view
|
0 comments
▲
1
Benchmarking Cloud vs. Local LLMs Why back end choice matters more than quant
(arxiv.org)
by tleitch |
view
|
0 comments
▲
1
Show HN: Benchmarking how AI models write vulnerable code under pressure
(leaderboard.atella.ai)
by kitdobyns |
view
|
0 comments
▲
1
Benchmarking open-weight models for security research
(dualuse.dev)
by lebovic |
view
|
0 comments
▲
1
Shyell – a Rust shell with built-in benchmarking and project-aware prompts
(github.com)
by paperplaneflyr |
view
|
0 comments
▲
1
Cryload: Powerful HTTP Benchmarking Tool Written in Crystal
(github.com)
by sdogruyol |
view
|
1 comments
▲
1
Benchmarking FFmpeg's H.265 Options
(scottstuff.net)
by anw |
view
|
0 comments
▲
1
I "Rewrote" My ORM Again with AI. and Ended Up Benchmarking Every PHP ORM
(technex.us)
by hparadiz |
view
|
0 comments
▲
1
Precision over perception: Why architecture matters in benchmarking
(redhat.com)
by salkahfi |
view
|
0 comments
▲
1
Quantization, LoRA, and the 8% Problem Benchmarking Local LLMs for Production AI
(walsenburgtech.com)
by cowartc |
view
|
0 comments
▲
1
Benchmarking LLMs with Marimo Pair
(ericmjl.github.io)
by akshayka |
view
|
0 comments
▲
1
Benchmarking LLM Tool-Use in the Wild
(arxiv.org)
by Brajeshwar |
view
|
0 comments
▲
1
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Task
(arxiv.org)
by mohsen1 |
view
|
0 comments
▲
1
TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks (2023)
(arxiv.org)
by locknitpicker |
view
|
0 comments
▲
1
Inferena – benchmarking inference of popular models on consumer hardware
(inferena.tech)
by kvark |
view
|
0 comments
▲
1
The need for better compiler frontend benchmarks: Carbon's benchmarking approach
(discourse.llvm.org)
by matt_d |
view
|
0 comments
▲
1
PGO to Post-Link Optimization: Exploring and Benchmarking Clang (Part 1)
(sidchintamaneni.com)
by blockholder |
view
|
0 comments
▲
1
Benchmarking Permutations
(koaning.io)
by Brajeshwar |
view
|
0 comments
▲
1
Show HN: Benchmarking LLMs through autonomous games of Blood on the Clocktower
(clocktower-radio.com)
by cjami |
view
|
0 comments
▲
1
Benchmarking quantum simulation with neutron-scattering experiments
(arxiv.org)
by rbanffy |
view
|
0 comments
▲
1
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks
(scbench.ai)
by matt_d |
view
|
0 comments
▲
1
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks
(arxiv.org)
by FiberBundle |
view
|
0 comments
▲
1
Show_HN: ClawHub Skills Benchmarking – Find Bugs, Drift, and Slowdowns
(github.com)
by just-claw-it |
view
|
0 comments
▲
1
An open source benchmarking framework for IT automation
(github.com)
by pranay01 |
view
|
0 comments
▲
1
ShowHN: Agonora – Character benchmarking for the post-AI job market
(agonora.com)
by mw67 |
view
|
0 comments