cuda 13.0 release overview
on this page
cuda 13.0 was released in august 2025 with significant architectural changes and unified arm platform support.
overview
cuda 13.0 introduces:
- unified arm platform installation
- blackwell gpu support
- architectural deprecations (maxwell, pascal, volta)
- fatbin compression switch from lz4 to zstd
- shared memory register spilling
platform changes
unified arm support
cuda 13.0 consolidates arm support across platforms:
- single installer for all arm architectures
- arm64-sbsa unified support
- grace hopper (gh200) optimizations
- jetson orin excluded from initial release
dropped architectures
removed support for:
- maxwell (gtx 750, gtx 900 series) - compute 5.x
- pascal (gtx 1000 series) - compute 6.x
- volta (titan v, quadro gv100) - compute 7.0
nvidia states these architectures are “feature-complete with no further enhancements planned.”
new features
compiler improvements
- llvm clang 20 support
- gcc 15 support
- compile time advisor (ctadvisor) tool
- 32-bit vector type alignment for blackwell
performance enhancements
- register spilling to shared memory
- zstd fatbin compression (smaller binaries)
- improved cuda graph performance
- enhanced error reporting for cuda apis
api additions
// new host memory support
cuMemCreate() // with host support
cudaMallocAsync() // host allocation support
driver requirements
minimum driver: r580 series (580.65.06+)
# verify driver version
nvidia-smi | grep "Driver Version"
# must show 580.xx or higher
distribution support
newly supported
- red hat enterprise linux 10/9.6
- debian 12.10
- fedora 42
- rocky linux 9.6/10.0
- ubuntu 24.04 lts
- ubuntu 22.04 lts (continued)
not supported
- ubuntu 25.04 (non-lts)
- ubuntu 25.10 (non-lts)
- ubuntu 23.10 (non-lts, eol)
note: nvidia typically only supports ubuntu lts releases. debian 12.10 is supported despite being a point release.
dropped
- ubuntu 20.04 lts
- older rhel/centos versions
migration guide
checking gpu compatibility
# list gpu compute capability
nvidia-smi --query-gpu=name,compute_cap --format=csv
# supported architectures (compute 7.5+):
# - turing (rtx 20xx)
# - ampere (rtx 30xx)
# - ada lovelace (rtx 40xx)
# - hopper (h100)
# - blackwell (b100/b200)
code migration
-
vector type alignment
// old (may cause issues on blackwell) struct float4 { float x, y, z, w; }; // cuda 13.0 (32-bit aligned) __align__(32) struct float4 { float x, y, z, w; };
-
deprecated apis
- multi-device launch apis removed
- legacy vector types deprecated
- nvprof and nvidia visual profiler removed
pytorch compatibility
pytorch cuda 13.0 support status (august 2025):
- tracking issue: pytorch#159779
- release engineering evaluating build complexity
- potential removal of cuda 12.x builds to accommodate
installation
docker (recommended)
prerequisite: nvidia container toolkit is required for gpu access in containers
# install nvidia container toolkit (ubuntu/debian)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# cuda 13.0 images are now available with version 13.0.0
# base image (minimal, no nvidia-smi)
docker pull nvidia/cuda:13.0.0-base-ubuntu24.04
# runtime image (includes cuda runtime and nvidia-smi)
docker pull nvidia/cuda:13.0.0-runtime-ubuntu24.04
# devel image (includes nvcc compiler and development tools)
docker pull nvidia/cuda:13.0.0-devel-ubuntu24.04
# test gpu access (requires nvidia container toolkit)
docker run --rm --gpus all nvidia/cuda:13.0.0-runtime-ubuntu24.04 nvidia-smi
# check nvcc version (devel image only)
docker run --rm --gpus all nvidia/cuda:13.0.0-devel-ubuntu24.04 nvcc --version
available variants:
- base: minimal cuda libraries only
- runtime: cuda runtime + nvidia-smi
- devel: full cuda toolkit with nvcc compiler
- cudnn-runtime: runtime + cudnn libraries
- cudnn-devel: devel + cudnn libraries
- tensorrt-runtime: runtime + tensorrt
- tensorrt-devel: devel + tensorrt
supported operating systems:
- ubuntu 24.04, 22.04
- rocky linux 8, 9, 10
- ubi (red hat universal base image) 8, 9, 10
- oracle linux 8, 9
- opensuse 15
native installation
# download cuda 13.0
wget https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64
# install driver first (if needed)
sudo apt install nvidia-driver-580
# install cuda toolkit
sudo sh cuda_13.0_linux.run --toolkit --silent
jax with cuda 13.0
warning august 2025: jax-cuda13-plugin (0.0.1rc0) is a placeholder package with no actual cuda 13 support
# the cuda13 extra doesn't exist in jax 0.6.2
uv pip install "jax[cuda13]" # warning: no extra named 'cuda13'
# jax 0.7.0 is available but cuda13 support not implemented
uv pip install --prerelease=allow "jax>=0.7.0"
# the cuda13 plugin is just a placeholder (empty wheel)
uv pip install --prerelease=allow jax-cuda13-plugin==0.0.1rc0
# installs but contains no actual code
# for now, use cuda 12.x builds
uv pip install "jax[cuda12]" # cuda 12.x still recommended
current status: jax cuda 13.0 support is not yet functional. the plugin package exists on pypi but is an empty placeholder. continue using cuda 12.x builds until official support is released.
performance considerations
fatbin compression
cuda 13.0 switches from lz4 to zstd:
- ~20% smaller fatbin files
- slightly slower initial load
- better for distribution/containers
shared memory spilling
new feature allows register spillage to shared memory:
- reduces local memory pressure
- improves kernel occupancy
- automatic optimization
framework support
update august 2025: no major ml frameworks have functional cuda 13.0 support yet
framework | cuda 13.0 status | verified sources |
---|---|---|
pytorch | no official support | github #159779 - discussing build complexity |
tensorflow | no official support | latest nvidia containers use cuda 12.8 (per nvidia docs) |
jax | placeholder only | jax-cuda13-plugin 0.0.1rc0 is empty package - no actual implementation |
current supported cuda versions (august 2025):
- pytorch: cuda 11.8, 12.1, 12.4 (planning 12.6 for v2.6)
- tensorflow: up to cuda 12.8 in nvidia optimized containers
- jax: cuda 12.x (cuda 13.x defined but plugin not yet available)
troubleshooting
common issues
-
unsupported gpu error
- check compute capability >= 7.5
- maxwell/pascal/volta no longer supported
-
driver version mismatch
# requires r580+ driver nvidia-smi # should show 580.xx+
-
framework compatibility
- continue using cuda 12.x or 11.8 builds
- monitor framework release notes for cuda 13.0 support
- pytorch tracking: github #159779
-
vllm dependency chain
- vllm depends on pytorch and cupy
- neither pytorch nor cupy support cuda 13.0 yet
- vllm cuda 13.0 support blocked until dependencies update
future roadmap
risc-v support
nvidia announced cuda coming to risc-v:
- no timeline in cuda 13.0
- part of broader architecture expansion
- following arm unification pattern
potential cuda 14.0
based on deprecation patterns:
- turing (compute 7.5) likely next removal target
- further arm platform integration
- potential risc-v preview