News
Latest
Top
Search
Submit
Login
Search
▲
271
CUDA Ontology
(jamesakl.com)
by gugagore |
view
|
40 comments
▲
132
CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL
(github.com)
by dzign |
view
|
15 comments
▲
88
Why CUDA translation wont unlock AMD
(eliovp.com)
by JonChesterfield |
view
|
81 comments
▲
22
Parrot – A C++ library for fused array operations using CUDA/Thrust
(nvlabs.github.io)
by operator-name |
view
|
2 comments
▲
17
Apex GPU: Run CUDA Apps on AMD GPUs Without Recompilation
(github.com)
by ArchitectAI |
view
|
6 comments
▲
11
Show HN: CUDA, Shmuda: Fold Proteins on a MacBook
(latentspacecraft.com)
by geoffitect |
view
|
3 comments
▲
6
Nvidia CUDA Tile
(developer.nvidia.com)
by elkguy |
view
|
0 comments
▲
5
Show HN: Clangd for CUDA Device Code
(docs.scale-lang.com)
by JonChesterfield |
view
|
0 comments
▲
5
Nvidia's B200: Keeping the CUDA Juggernaut Rolling Ft Verda, Formerly DataCrunch
(chipsandcheese.com)
by rbanffy |
view
|
0 comments
▲
5
Nvidia cuTile: Python DSL and a new IR for tile-based CUDA kernels
(github.com)
by ashvardanian |
view
|
0 comments
▲
4
NVIDIA CUDA Tile programming model
(developer.nvidia.com)
by tanelpoder |
view
|
0 comments
▲
4
Show HN: modal-cuda – CLI to run CUDA .cu programs on Modal GPUs
(github.com)
by Sai_Praneeth |
view
|
0 comments
▲
3
AI-Written CUDA Kernels Outperforms Nvidia's Best Matmul Library
(rohan-paul.com)
by dzign |
view
|
0 comments
▲
3
An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
(arxiv.org)
by PaulHoule |
view
|
0 comments
▲
3
Understanding the CUDA Compiler and PTX with a Top-K Kernel
(blog.alpindale.net)
by mfiguiere |
view
|
0 comments
▲
2
Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs
(arxiv.org)
by matt_d |
view
|
0 comments
▲
2
CUDA Released in Basic
(developer.nvidia.com)
by apples2apples |
view
|
0 comments
▲
2
Nvidia CUDA Tile
(developer.nvidia.com)
by apples2apples |
view
|
0 comments
▲
2
CUDA Tile
(techpowerup.com)
by dagmx |
view
|
0 comments
▲
2
Show HN: Free GPUs in your terminal for learning CUDA
(github.com)
by RohanAdwankar |
view
|
0 comments
▲
2
CUDA-Q Back Ends: Quantum Hardware (QPU)
(nvidia.github.io)
by westurner |
view
|
3 comments
▲
1
Nvidia CUDA Python 1.0 and CUDA 13.3 Release
(developer.nvidia.com)
by ashvardanian |
view
|
0 comments
▲
1
AutoMegaKernel: Compiling a LLM into a single CUDA kernel
(arxiv.org)
by OsamaJaber |
view
|
0 comments
▲
1
What about OpenCL and CUDA C++ alternatives?
(modular.com)
by eatonphil |
view
|
0 comments
▲
1
AutoMegaKernel: Compile an LLM into one provably-correct CUDA megakernel
(github.com)
by OsamaJaber |
view
|
0 comments
▲
1
Rl.cu: Training LLM RL with Pure CUDA
(github.com)
by KJL0508 |
view
|
0 comments
▲
1
Tiny hackable CUDA language model implementation
(github.com)
by markusheimerl |
view
|
0 comments
▲
1
Compiling Zig Kernels to CUDA PTX (Zigton Phase 1)
(portfolio.lovesahaj1225.workers.dev)
by lovesahaj |
view
|
1 comments
▲
1
How to Optimize a CUDA Matmul Kernel for cuBLAS-Like Performance: A Worklog
(siboehm.com)
by Areibman |
view
|
0 comments
▲
1
When does fragmentation occur in the CUDA caching allocator?
(docs.pytorch.org)
by matt_d |
view
|
0 comments
▲
1
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
(github.com)
by yu3zhou4 |
view
|
0 comments
▲
1
Pushing memory bound CUDA kernels past the speed of light with data compression
(fergusfinn.com)
by somnial |
view
|
0 comments
▲
1
Nvidia CUDA 13.3 Rolls Out CUDA Python 1.0, CUDA Tile for C++
(phoronix.com)
by Bender |
view
|
0 comments
▲
1
Ternative – C++/CUDA inference engine for ternary LLMs with runtime LoRA
(github.com)
by michelangeloro |
view
|
1 comments
▲
1
Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint
(modal.com)
by charles_irl |
view
|
0 comments
▲
1
CUDA Books
(github.com)
by dariubs |
view
|
0 comments
▲
1
Nvidia/Numba-CUDA-mlir: CUDA C++-style Python GPU compiler built on MLIR
(github.com)
by nickwanninger |
view
|
0 comments
▲
1
CUDA Proves Nvidia Is a Software Company
(wired.com)
by bookofjoe |
view
|
1 comments
▲
1
Atlas: An LLM inference engine written from scratch in Rust and CUDA
(atlasinference.io)
by emrehan |
view
|
0 comments
▲
1
The creators of SQL, C++, CUDA, Haskell reviewed my book on Programming History
(helloworldthebook.com)
by DaleBiagio |
view
|
0 comments
▲
1
CUDA Proves Nvidia Is a Software Company
(wired.com)
by Brajeshwar |
view
|
0 comments
▲
1
CUDA-oxide: Nvidia's official Rust to CUDA compiler
(nvlabs.github.io)
by adamnemecek |
view
|
0 comments
▲
1
Nvidia releases CUDA-Oxide 0.1 for experimental Rust-to-CUDA compiler
(phoronix.com)
by birdculture |
view
|
0 comments
▲
1
Experimental Rust-to-CUDA Compiler
(github.com)
by cgravill |
view
|
1 comments
▲
1
An experimental Rust-to-CUDA compiler from Nvidia
(nvlabs.github.io)
by chenzhekl |
view
|
0 comments
▲
1
Nvidia introduces back end for CUDA kernels in Rust
(github.com)
by ketchup32613 |
view
|
0 comments
▲
1
CUDA-oxide, a Rust-to-CUDA compiler
(nvlabs.github.io)
by lacker |
view
|
0 comments
▲
1
CUDA-oxide an experimental Rust-to-CUDA compiler
(github.com)
by tzury |
view
|
0 comments
▲
1
cuda-oxide: a custom rustc backend for compiling GPU kernels in pure Rust
(github.com)
by matt_d |
view
|
0 comments
▲
1
Cutile.jl 0.3: CUDA.jl integration, and better performance and latency
(juliagpu.org)
by vchuravy |
view
|
0 comments