Conda 配置 CUDA/CuDNN/PyTorch 环境

计划重新调整服务器 Python / Pytorch / CUDA / CuDNN 版本。旧的环境为了兼容一些旧项目 (点名 TVM) 而使用了较早版本,导致新项目各种兼容性问题。在 Conda 中重配。

安装 Anaconda3

1
2
3
4
5
6
7
8
9
10
11
12
13
root@pve:~/share/project/.env/Anaconda3# chmod 777 Anaconda3-2024.10-1-Linux-x86_64.sh 
root@pve:~/share/project/.env/Anaconda3# ./Anaconda3-2024.10-1-Linux-x86_64.sh
...

Do you accept the license terms? [yes|no]
>>> yes
...

You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> no
...

Thank you for installing Anaconda3!

环境变量

1
2
3
4
5
root@pve:~/share/project/.env/Anaconda3# vim /etc/profile
>>> export PATH=/root/anaconda3/bin:$PATH
root@pve:~/share/project/.env/Anaconda3# source /etc/profile
root@pve:~/share/project/.env/Anaconda3# conda -V
conda 24.9.2

安装 Python

创建 Python 3.11 环境

1
root@pve:~/share/project/.env/Anaconda3# conda create -n torch0 python=3.11

初始化

1
2
root@pve:~/share/project/.env/Anaconda3# conda init
root@pve:~/share/project/.env/Anaconda3# source /root/.bashrc```

这里使用 python3 才能看到 Conda 安装的正确版本,因为之前配置服务器时在 .bashrc 中将 Python alias 了

1
2
3
root@pve:~/share/project/.env/Anaconda3# conda activate torch0
(torch0) root@pve:~/share/project/.env/Anaconda3# python3 -V
Python 3.11.11

安装 CUDA

CudaToolkit 安装,首先查看当前 Driver 支持的最高版本,然后执行

1
(torch0) root@pve:~/share/project/.env/Anaconda3# conda install nvidia/label/cuda-12.2.2::cuda-toolkit

查找 CuDNN 并安装

1
2
3
4
5
6
7
8
9
(torch0) root@pve:~/share/project/.env/Anaconda3# conda search cudnn
Loading channels: done
# Name Version Build Channel
cudnn 7.0.5 cuda8.0_0 pkgs/main
cudnn 8.9.2.26 cuda11_0 pkgs/main
cudnn 9.1.1.17 cuda12_1 pkgs/main
...

(torch0) root@pve:~/share/project/.env/Anaconda3# conda install cudnn=9.1

安装 PyTorch

参考 PyTorch 官网安装 PyTorch (p.s. 现在对 Rocm 支持越来越好了,AMD YES)

1
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia

首次安装卡在 99% 了,Ctrl + C 之后重新安装,出现如下报错

1
2
3
CondaVerificationError: The package for pytorch located at /root/anaconda3/pkgs/pytorch-2.5.1-py3.11_cuda12.1_cudnn9.1.0_0
appears to be corrupted. The path 'lib/python3.11/site-packages/torch/include/ATen/cuda/CUDAApplyUtils.cuh'
specified in the package manifest cannot be found.

清理损坏的包

1
2
3
4
(torch0) root@pve:~/share/project/.env/Anaconda3# conda clean --packages
Will remove 158 (10.90 GB) package(s).
Proceed ([y]/n)? y
# reinstall

Hint: 这里 Solving environment: done 会卡住很久,耐心等待

测试,注意使用 Python3

1
2
3
4
print(torch.backends.cudnn.version())  # 90100
print(torch.__version__) # 2.5.1
print(torch.version.cuda) # 12.1
print(torch.cuda.is_available()) # True

References

  1. Cuda Toolkit | Anaconda.org
  2. Previous PyTorch Versions | PyTorch
  3. Linux环境安装Anaconda(详细图文)_linux安装anaconda-CSDN博客
  4. conda环境中配置cuda+cudnn+pytorch深度学习环境_conda虚拟环境安装cuda-CSDN博客