News
Latest
Top
Search
Submit
Login
Search
▲
271
CUDA Ontology
(jamesakl.com)
by gugagore |
view
|
40 comments
▲
132
CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL
(github.com)
by dzign |
view
|
15 comments
▲
88
Why CUDA translation wont unlock AMD
(eliovp.com)
by JonChesterfield |
view
|
81 comments
▲
22
Parrot – A C++ library for fused array operations using CUDA/Thrust
(nvlabs.github.io)
by operator-name |
view
|
2 comments
▲
17
Apex GPU: Run CUDA Apps on AMD GPUs Without Recompilation
(github.com)
by ArchitectAI |
view
|
6 comments
▲
11
Show HN: CUDA, Shmuda: Fold Proteins on a MacBook
(latentspacecraft.com)
by geoffitect |
view
|
3 comments
▲
6
Nvidia CUDA Tile
(developer.nvidia.com)
by elkguy |
view
|
0 comments
▲
5
Nvidia's B200: Keeping the CUDA Juggernaut Rolling Ft Verda, Formerly DataCrunch
(chipsandcheese.com)
by rbanffy |
view
|
0 comments
▲
5
Nvidia cuTile: Python DSL and a new IR for tile-based CUDA kernels
(github.com)
by ashvardanian |
view
|
0 comments
▲
4
NVIDIA CUDA Tile programming model
(developer.nvidia.com)
by tanelpoder |
view
|
0 comments
▲
4
Show HN: modal-cuda – CLI to run CUDA .cu programs on Modal GPUs
(github.com)
by Sai_Praneeth |
view
|
0 comments
▲
3
AI-Written CUDA Kernels Outperforms Nvidia's Best Matmul Library
(rohan-paul.com)
by dzign |
view
|
0 comments
▲
3
An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
(arxiv.org)
by PaulHoule |
view
|
0 comments
▲
3
Understanding the CUDA Compiler and PTX with a Top-K Kernel
(blog.alpindale.net)
by mfiguiere |
view
|
0 comments
▲
2
Nvidia CUDA Tile
(developer.nvidia.com)
by apples2apples |
view
|
0 comments
▲
2
CUDA Tile
(techpowerup.com)
by dagmx |
view
|
0 comments
▲
2
Show HN: Free GPUs in your terminal for learning CUDA
(github.com)
by RohanAdwankar |
view
|
0 comments
▲
2
CUDA-Q Back Ends: Quantum Hardware (QPU)
(nvidia.github.io)
by westurner |
view
|
3 comments
▲
1
Large-Scale Agentic RL for CUDA Kernel Generation
(cuda-agent.github.io)
by gmays |
view
|
0 comments
▲
1
Cutile.jl: Tile-Based GPU Programming for CUDA GPUs
(discourse.julialang.org)
by KenoFischer |
view
|
0 comments
▲
1
Cutile.jl Brings CUDA Tile-Based Programming to Julia
(developer.nvidia.com)
by adgjlsfhk1 |
view
|
0 comments
▲
1
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
(arxiv.org)
by petethomas |
view
|
0 comments
▲
1
Show HN: Dust – Device Unified Serving Toolkit (CUDA for Phones)
(rogelioruiz.github.io)
by ruizprogelio |
view
|
1 comments
▲
1
AdaptiveCpp's new Metal backend to support CUDA dialect on Apple GPUs
(github.com)
by puschkinfr |
view
|
0 comments
▲
1
Custom FP4 CUDA Kernel – 129 Tflops on DGX Spark with Pre-Quantized Weight Cache
(forums.developer.nvidia.com)
by vkaufmann |
view
|
1 comments
▲
1
Show HN: A Black Hole Simulator in CUDA C++
(github.com)
by anwoy |
view
|
0 comments
▲
1
Python All the Way Down: Speed-of-Light CUDA Without Leaving Python
(nvidia.com)
by pjmlp |
view
|
0 comments
▲
1
BarraCUDA Open-source CUDA compiler targeting AMD GPUs
(github.com)
by rurban |
view
|
0 comments
▲
1
Simple CUDA-checkpoint wrapper to freeze and restore GPU processes quickly
(github.com)
by shayonj |
view
|
0 comments
▲
1
UltrafastSecp256k1 Zero-dep C++20 secp256k1 with ASM,CUDA, 27 coins,MuSig2,FROST
(github.com)
by shrecshrec |
view
|
1 comments
▲
1
Building a Zero-Dependency secp256k1 CUDA Engine from Scratch (2.5B ops/SEC)
(github.com)
by shrecshrec |
view
|
1 comments
▲
1
Show HN: Qeep – A deep learning framework written in Go with AutoGrad and CUDA
(github.com)
by sahands |
view
|
1 comments
▲
1
Show HN: Librediffusion: C++ / CUDA Reimplementation of StreamDiffusion
(github.com)
by jcelerier |
view
|
0 comments
▲
1
Show HN: Qeek – Go Deep‑Learning Framework with Tensors, AutoGrad and CUDA
(github.com)
by avestura |
view
|
0 comments
▲
1
We got Claude to teach open models how to write CUDA kernels
(huggingface.co)
by mooreds |
view
|
0 comments
▲
1
We got Claude to teach open models how to write CUDA kernels
(huggingface.co)
by amrrs |
view
|
0 comments
▲
1
Show HN: Universal DeepSeek OCR 2 – CPU, MPS, CUDA Support
(github.com)
by dogacel |
view
|
0 comments
▲
1
Claude Code Ported LeelaChessZero CUDA Back End to ROCm: End of CUDA Moat
(github.com)
by CalChris |
view
|
0 comments
▲
1
Triton CUDA Tile IR Back End
(github.com)
by my123 |
view
|
0 comments
▲
1
Cornell Virtual Workshop: Introduction to CUDA
(cvw.cac.cornell.edu)
by vinhnx |
view
|
0 comments
▲
1
CUDA Programming: From Zero to GPU Kernels – A Beginner's Guide
(pythongiant.github.io)
by pythongiant |
view
|
1 comments
▲
1
Iro-CUDA-FFI – Rust orchestrates nvcc‑compiled CUDA C++ kernels
(github.com)
by tribe-iro |
view
|
1 comments
▲
1
Nvidia's CUDA libraries can be generic and not optimized for LLM inference
(github.com)
by venkat_2811 |
view
|
1 comments
▲
1
Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks
(elliotarledge.com)
by ipnon |
view
|
0 comments
▲
1
How to Beat Unsloth's CUDA Kernel Using Mojo–With Zero GPU Experience
(modular.com)
by timmyd |
view
|
1 comments
▲
1
Zluda run unmodified CUDA on non Nvidia hw
(phoronix.com)
by gigatexal |
view
|
0 comments
▲
1
Real Time Speech to Text Running Whisper.cpp with CUDA on Jetson Orin Nano Super
(thomasthelliez.com)
by thomasthelliez |
view
|
1 comments
▲
1
AWS Trainium vs. Nvidia CUDA for Medical Image Classification
(medrxiv.org)
by salty_frog |
view
|
0 comments
▲
1
Show HN: USM-Core Header-only CUDA library for ragged reductions(2.5x baseline)
(github.com)
by ottoselymesi |
view
|
0 comments
▲
1
Pyimagecuda-studio: Design image pipelines visually. Automate with Python
(github.com)
by thunderbong |
view
|
0 comments