News
Latest
Top
Search
Submit
Login
Search
▲
271
CUDA Ontology
(jamesakl.com)
by gugagore |
view
|
40 comments
▲
132
CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL
(github.com)
by dzign |
view
|
15 comments
▲
88
Why CUDA translation wont unlock AMD
(eliovp.com)
by JonChesterfield |
view
|
81 comments
▲
22
Parrot – A C++ library for fused array operations using CUDA/Thrust
(nvlabs.github.io)
by operator-name |
view
|
2 comments
▲
17
Apex GPU: Run CUDA Apps on AMD GPUs Without Recompilation
(github.com)
by ArchitectAI |
view
|
6 comments
▲
11
Show HN: CUDA, Shmuda: Fold Proteins on a MacBook
(latentspacecraft.com)
by geoffitect |
view
|
3 comments
▲
6
Nvidia CUDA Tile
(developer.nvidia.com)
by elkguy |
view
|
0 comments
▲
5
Show HN: Clangd for CUDA Device Code
(docs.scale-lang.com)
by JonChesterfield |
view
|
0 comments
▲
5
Nvidia's B200: Keeping the CUDA Juggernaut Rolling Ft Verda, Formerly DataCrunch
(chipsandcheese.com)
by rbanffy |
view
|
0 comments
▲
5
Nvidia cuTile: Python DSL and a new IR for tile-based CUDA kernels
(github.com)
by ashvardanian |
view
|
0 comments
▲
4
NVIDIA CUDA Tile programming model
(developer.nvidia.com)
by tanelpoder |
view
|
0 comments
▲
4
Show HN: modal-cuda – CLI to run CUDA .cu programs on Modal GPUs
(github.com)
by Sai_Praneeth |
view
|
0 comments
▲
3
AI-Written CUDA Kernels Outperforms Nvidia's Best Matmul Library
(rohan-paul.com)
by dzign |
view
|
0 comments
▲
3
An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
(arxiv.org)
by PaulHoule |
view
|
0 comments
▲
3
Understanding the CUDA Compiler and PTX with a Top-K Kernel
(blog.alpindale.net)
by mfiguiere |
view
|
0 comments
▲
2
CUDA Released in Basic
(developer.nvidia.com)
by apples2apples |
view
|
0 comments
▲
2
Nvidia CUDA Tile
(developer.nvidia.com)
by apples2apples |
view
|
0 comments
▲
2
CUDA Tile
(techpowerup.com)
by dagmx |
view
|
0 comments
▲
2
Show HN: Free GPUs in your terminal for learning CUDA
(github.com)
by RohanAdwankar |
view
|
0 comments
▲
2
CUDA-Q Back Ends: Quantum Hardware (QPU)
(nvidia.github.io)
by westurner |
view
|
3 comments
▲
1
From 800ms to ~25ms: harness-driven optimization of a CUDA matmul kernel
(github.com)
by icyace |
view
|
0 comments
▲
1
Show HN: Mixlab, an ML arch lab in Go. JSON config, Metal and CUDA, 1.6s builds
(github.com)
by mrothroc |
view
|
0 comments
▲
1
TurboOCR: CUDA and TensorRT OCR Server at 270 img/s
(github.com)
by pfdomizer |
view
|
0 comments
▲
1
Parrot is a C++ library for fused array operations using CUDA/Thrust
(github.com)
by tosh |
view
|
0 comments
▲
1
Show HN: A web-based replacement for Nvidia's CUDA occupancy spreadsheet
(toolbelt.widgita.xyz)
by fairlight1337 |
view
|
0 comments
▲
1
One-command local AI stack setup for Ubuntu (CUDA, Ollama, llama.cpp, chat UIs)
(github.com)
by christianbusch |
view
|
0 comments
▲
1
Agent Wispr – Local dictation for terminals (CUDA, local model, cross-platform)
(agentwispr.joshlehman.ca)
by JoshuaLehman |
view
|
0 comments
▲
1
Show HN: TurboOCR up to 1200 pages/s with Paddle and TensorRT (C++/CUDA, FP16)
(github.com)
by pfdomizer |
view
|
0 comments
▲
1
Challenges in Decompilation and Reverse Engineering of CUDA Kernels [video]
(youtube.com)
by nicolodev |
view
|
0 comments
▲
1
TurboOCR: 270–1200 img/s OCR with Paddle and TensorRT (C++/CUDA, FP16)
(github.com)
by pfdomizer |
view
|
0 comments
▲
1
Taking on CUDA with ROCm: 'One Step After Another'
(eetimes.com)
by mindcrime |
view
|
0 comments
▲
1
CUDA Programming for Nvidia H100s
(freecodecamp.org)
by eigenBasis |
view
|
0 comments
▲
1
Model2Kernel: Model-Aware Symbolic Execution for Safe CUDA Kernels
(arxiv.org)
by PaulHoule |
view
|
0 comments
▲
1
CUDA Tile is the biggest GPU programming shift in 20 years
(pub.towardsai.net)
by Aedelon |
view
|
0 comments
▲
1
Python All the Way Down: Speed-of-Light CUDA Without Leaving Python
(nvidia.com)
by pjmlp |
view
|
0 comments
▲
1
MXFP8 GEMM: Up to 99% of cuBLAS Performance Using CUDA and PTX
(danielvegamyhre.github.io)
by matt_d |
view
|
0 comments
▲
1
Llama.cpp with CUDA Support on Original Jetson Nano (4GB)
(github.com)
by Abishek_Muthian |
view
|
0 comments
▲
1
CUDA VRAM overcommit support for Linux
(old.reddit.com)
by itvision |
view
|
0 comments
▲
1
Challenges and Design Issues in Finding CUDA Bugs via GPU-Native Fuzzing
(arxiv.org)
by matt_d |
view
|
0 comments
▲
1
Show HN: Mamba SSM in Rust – training and inference with custom CUDA kernels
(github.com)
by silvermpx |
view
|
0 comments
▲
1
GPU Lite for CUDA without the bloat
(github.com)
by HaoZeke |
view
|
0 comments
▲
1
Challenges in Decompilation and RE of CUDA-Based Kernels [video]
(youtube.com)
by nicolodev |
view
|
0 comments
▲
1
MLX: CUDA
(ml-explore.github.io)
by tosh |
view
|
0 comments
▲
1
Show HN: CUDA Farm – 65 physics endpoints that can't hallucinate
(cudafarm-landing.vercel.app)
by kluton |
view
|
0 comments
▲
1
CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features
(developer.nvidia.com)
by pjmlp |
view
|
0 comments
▲
1
Cutile.jl Brings Nvidia CUDA Tile-Based Programming to Julia
(developer.nvidia.com)
by pjmlp |
view
|
0 comments
▲
1
CUDA-morph: PyTorch .cuda() code on AMD/Intel/Ascend without rewrites
(github.com)
by josephahn291215 |
view
|
0 comments
▲
1
Show HN: Pu-erh Lab, a CUDA-accelerated RAW photo editor
(github.com)
by yurunzi |
view
|
1 comments
▲
1
CUDA Unified Memory Analyzer – measure what your GPU is doing with memory
(github.com)
by gpu_systems |
view
|
1 comments
▲
1
Challenges in Decompilation and Reverse Engineering of CUDA-Based Kernels [pdf]
(nicolo.dev)
by matt_d |
view
|
0 comments