News
Latest
Top
Search
Submit
Login
Search
▲
181
Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark
(simonwillison.net)
by nabla9 |
view
|
58 comments
▲
124
Benchmarking leading AI agents against Google reCAPTCHA v2
(research.roundtable.ai)
by mdahardy |
view
|
97 comments
▲
113
New benchmark shows top LLMs struggle in real mental health care
(swordhealth.com)
by RicardoRei |
view
|
157 comments
▲
74
My Life Is a Lie: How a Broken Benchmark Broke America
(yesigiveafig.com)
by jger15 |
view
|
50 comments
▲
59
Show HN: DNS Benchmark Tool – Compare and monitor resolvers
(github.com)
by ovo101 |
view
|
30 comments
▲
53
Drawing Text Isn't Simple: Benchmarking Console vs. Graphical Rendering
(cv.co.hu)
by PaulHoule |
view
|
41 comments
▲
43
Linux kernel patch from Thomas Gleixner improves Postgres benchmark by 15%
(lore.kernel.org)
by throwaway2037 |
view
|
2 comments
▲
43
AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds
(gizmodo.com)
by Cynddl |
view
|
17 comments
▲
37
JavaScript Engines Benchmarks
(ivankra.github.io)
by homebrewer |
view
|
2 comments
▲
31
How Good Are Chinese CPUs? Benchmarking the Loongson 3A6000
(lemire.me)
by ashvardanian |
view
|
1 comments
▲
28
Benchmarking the Most Reliable Document Parsing API
(tensorlake.ai)
by calavera |
view
|
14 comments
▲
26
JanitorBench: A new LLM benchmark for multi-turn chats
(about.janitorai.com)
by shep101 |
view
|
6 comments
▲
25
Poetiq shatters ARC-AGI 2 benchmark at half the cost
(poetiq.ai)
by flavio87 |
view
|
3 comments
▲
25
AMD vs. Intel: A Unicode Benchmark
(lemire.me)
by ibobev |
view
|
2 comments
▲
13
RIP Windows: Linux GPU Gaming Benchmarks on Bazzite [video]
(youtube.com)
by doener |
view
|
0 comments
▲
11
Runway rolls out new AI video model that beats Google, OpenAI in key benchmark
(cnbc.com)
by tiahura |
view
|
3 comments
▲
10
Benchmarking NVENC video transcoding on the Pi
(jeffgeerling.com)
by ingve |
view
|
0 comments
▲
9
Apache Iceberg vs. Databricks – benchmarked
(olake.io)
by Cappybara12 |
view
|
2 comments
▲
7
How to Benchmark Rust Code
(codspeed.io)
by adriencaccia |
view
|
0 comments
▲
6
Cline-Bench: A Real-World, Open-Source Benchmark for Agentic Coding
(cline.bot)
by janpio |
view
|
0 comments
▲
6
We Benchmarked Frontier LLMs on Defensive Security. The Results Surprised Us
(cotool.ai)
by logancarmody |
view
|
2 comments
▲
6
Why Alpha Arena was a bad benchmark
(borisagain.substack.com)
by mpavlov |
view
|
0 comments
▲
5
How to Benchmark C++ Code
(codspeed.io)
by art049 |
view
|
0 comments
▲
5
Benchmarking KDB-X vs. QuestDB, ClickHouse, TimescaleDB and InfluxDB
(kx.com)
by rustc |
view
|
0 comments
▲
5
How to Benchmark Python Code?
(codspeed.mintlify.dev)
by adriencaccia |
view
|
0 comments
▲
5
Show HN: Benchmark your team's AI coding security posture
by jaimefjorge |
view
|
0 comments
▲
4
N8n vs. Nyno for Python Automation:The Benchmarks and Why Nyno Is Much Faster
(nyno.dev)
by theyogadev |
view
|
1 comments
▲
4
Show HN: Agent Runner – open-source agent harness to benchmark real coding
(designarena.ai)
by grace77 |
view
|
0 comments
▲
4
Benchmarking my Redis clone in Zig (a web dev learning systems)
(charlesfonseca.substack.com)
by barddoo |
view
|
1 comments
▲
4
Hetzner Servers Benchmark
(softuts.com)
by XCSme |
view
|
1 comments
▲
3
Show HN: I benchmarked read latency of AWS S3, S3Express, EBS and Instance store
(nixiesearch.substack.com)
by shutty |
view
|
0 comments
▲
3
Human behavior isn't coherent enough to be a benchmark for AI
(kemendo.com)
by AndrewKemendo |
view
|
0 comments
▲
3
Benchmarking GPT-5.1 vs. Gemini 3.0 vs. Opus 4.5 across 3 Coding Tasks
(blog.kilo.ai)
by heymax054 |
view
|
0 comments
▲
3
80.1 % on LoCoMo Long-Term Memory Benchmark with a pure open-source RAG pipeline
by ViktorKuz |
view
|
0 comments
▲
3
Benchmarking LLMs at the Frontier of Physics
(artificialanalysis.ai)
by mustaphah |
view
|
0 comments
▲
3
Snapdragon X2 Elite laptop chip benchmarks
(tomsguide.com)
by walterbell |
view
|
1 comments
▲
3
PrinceJS: Benchmark Corrections and Lessons from a 13-Year-Old Developer
(github.com)
by lilprince1218 |
view
|
1 comments
▲
3
DNALONGBENCH: A benchmark suite for long-range DNA prediction tasks
(nature.com)
by bookofjoe |
view
|
0 comments
▲
3
Benchmarking Language Implementations: Am I doing it right? Get Early Feedback
(stefan-marr.de)
by speckx |
view
|
0 comments
▲
3
Official LIGO-Virgo-Kagra Benchmark Shows KFR Outperforming FFTW in CERN Root
(ar5iv.labs.arxiv.org)
by danlcaza |
view
|
0 comments
▲
3
Exasol Outperforms ClickHouse by 10x on TPC-H Analytical Benchmark
(exasol.com)
by one-random-geek |
view
|
0 comments
▲
3
Powering AI at Scale: Benchmarking 1B Vectors in YugabyteDB
(yugabyte.com)
by ashvardanian |
view
|
0 comments
▲
3
Benchmarking the Cost of Java's EnumSet – A Second Look
(kinnen.de)
by birdculture |
view
|
0 comments
▲
3
Benchmarking multilingual long-context language models
(arxiv.org)
by sysoleg |
view
|
0 comments
▲
3
Measuring What Matters: Construct Validity in Large Language Model Benchmarks
(oxrml.com)
by Cynddl |
view
|
2 comments
▲
2
Serenely Fast I/O Buffer benchmarked
(serenedb.com)
by mkornaukhov |
view
|
0 comments
▲
2
Chess LLM Benchmark: Evaluating LLMs' ability to play chess
(github.com)
by dwohnitmok |
view
|
0 comments
▲
2
Reverse Benchmarking
(dominiknitsch.com)
by wseqyrku |
view
|
0 comments
▲
2
FreeBSD 15.0 Benchmarks Versus FreeBSD 14.3
(phoronix.com)
by doener |
view
|
0 comments
▲
2
The Gap for LLMs Isn't Benchmarks – It's Everyday Value
(github.com)
by bill3389 |
view
|
1 comments