Search | News by Netwrck

CUDA Ontology

(jamesakl.com) by gugagore | view | 40 comments

CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

(github.com) by dzign | view | 15 comments

Why CUDA translation wont unlock AMD

(eliovp.com) by JonChesterfield | view | 81 comments

Parrot – A C++ library for fused array operations using CUDA/Thrust

(nvlabs.github.io) by operator-name | view | 2 comments

Apex GPU: Run CUDA Apps on AMD GPUs Without Recompilation

(github.com) by ArchitectAI | view | 6 comments

Show HN: CUDA, Shmuda: Fold Proteins on a MacBook

(latentspacecraft.com) by geoffitect | view | 3 comments

Nvidia CUDA Tile

(developer.nvidia.com) by elkguy | view | 0 comments

Show HN: Clangd for CUDA Device Code

(docs.scale-lang.com) by JonChesterfield | view | 0 comments

Nvidia's B200: Keeping the CUDA Juggernaut Rolling Ft Verda, Formerly DataCrunch

(chipsandcheese.com) by rbanffy | view | 0 comments

Nvidia cuTile: Python DSL and a new IR for tile-based CUDA kernels

(github.com) by ashvardanian | view | 0 comments

NVIDIA CUDA Tile programming model

(developer.nvidia.com) by tanelpoder | view | 0 comments

Show HN: modal-cuda – CLI to run CUDA .cu programs on Modal GPUs

(github.com) by Sai_Praneeth | view | 0 comments

AI-Written CUDA Kernels Outperforms Nvidia's Best Matmul Library

(rohan-paul.com) by dzign | view | 0 comments

An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

(arxiv.org) by PaulHoule | view | 0 comments

Understanding the CUDA Compiler and PTX with a Top-K Kernel

(blog.alpindale.net) by mfiguiere | view | 0 comments

AMD calls CUDA a 'non-event'

(pcgamer.com) by neilfrndes | view | 0 comments

Performant C/CUDA inference engine for Qwen 3.6 35B on RTX 5090 / Blackwell

(github.com) by ambuds | view | 0 comments

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

(github.com) by vforno | view | 0 comments

Show HN: CUDA Profiler for Production Inference

(github.com) by npgraph | view | 0 comments

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

(arxiv.org) by matt_d | view | 0 comments

CUDA Released in Basic

(developer.nvidia.com) by apples2apples | view | 0 comments

Nvidia CUDA Tile

(developer.nvidia.com) by apples2apples | view | 0 comments

CUDA Tile

(techpowerup.com) by dagmx | view | 0 comments

Show HN: Free GPUs in your terminal for learning CUDA

(github.com) by RohanAdwankar | view | 0 comments

CUDA-Q Back Ends: Quantum Hardware (QPU)

(nvidia.github.io) by westurner | view | 3 comments

Nvprobe – Open-source, zero-setup CLI for CUDA benchmarks

(github.com) by SergioZ3R0 | view | 0 comments

Bw24 – from scratch rust+CUDA inference, every kernel tuned for sm_120a

(github.com) by anotherCodder | view | 0 comments

Running CUDA on Apple GPUs

(twitter.com) by abhinavsns | view | 0 comments

Eliminating Conda/CUDA dependency hell in computational biology pipelines

(github.com) by LionTurtle13 | view | 0 comments

CUDA-accelerated program to search Minecraft seeds for Mooshrom Island biomes

(github.com) by Tmpod | view | 0 comments

Show HN: Trellis2.c – Local 3D generation with Vulkan and CUDA

(github.com) by wimaxs | view | 0 comments

Fusing a 27B ternary LLM's whole decode step into one CUDA kernel

(twitter.com) by Jr23_xd | view | 0 comments

Alternative(s) to run CUDA on non-Nvidia hardware

(hpcwire.com) by alok-g | view | 0 comments

Optimizing CUDA Like a Human: Micro-Profiling Tools as Expert Surrogates For

(hgpu.org) by ibobev | view | 0 comments

What If Java Apps Could Access CUDA Ecosystem Gracefully

(tornadovm.org) by mikepapadim | view | 0 comments

1970 Plymouth Hemi 'CUDA

(knuckledustchronicles.com) by frobinson47 | view | 0 comments

Show HN: A 100% branchless, CUDA-native AI guardrail kernel written in C++20

(github.com) by PJHkorea | view | 0 comments

Show HN: Real-time n-body tree code in CUDA

(github.com) by lechebs | view | 0 comments

Reverse-engineering Nvidia's CUDA-checkpoint for faster cold starts

(blog.doubleword.ai) by ilreb | view | 0 comments

Show HN: A free, GPU-accelerated Texas Hold'em GTO solver in C++/CUDA

(bupticybee.github.io) by bupticybee | view | 0 comments

Zluda (CUDA-compatible runtime for AMD) loses funding again, gains 32bit compat

(vosen.github.io) by indrora | view | 0 comments

Zluda 6 release (run unmodified CUDA applications on non-Nvidia GPUs)

(vosen.github.io) by Tiberium | view | 0 comments

What happens when you run a CUDA kernel?

(fergusfinn.com) by mezark | view | 0 comments

Optimizing a CUDA FSST decompression kernel

(polarsignals.com) by asubiotto | view | 0 comments

CUDA Profiler for Production Inference

(graphsignal.com) by npgraph | view | 0 comments

Running a 35B MoE model on a 2017 AMD RX 580 8GB via Vulkan (no ROCm/CUDA)

(github.com) by aivisionslab | view | 0 comments

Fast Great-Circle Distance Calculation in CUDA C++

(developer.nvidia.com) by Alien1Being | view | 0 comments

Running local AI on AMD RX 580 (2017 GPU) using Vulkan – no CUDA, no ROCm

(setup-ia-local-rx580-vulkan.web.app) by aivisionslab | view | 0 comments

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

(github.com) by vforno | view | 0 comments

Show HN: FlashQwen – A from-scratch CUDA inference engine for Qwen3

(github.com) by langtang1996 | view | 0 comments