Git与Github
在Linux的Ubuntu发行版上一般都会默认安装了Git,所以不需要自己手动安装,拿来即用即可。
1 2 git config --global user.name "SSH keys Name" git config --global user.email "SSH keys Email"
1 ssh-keygen -t rsa -C "Email of Github Account"
1 2 3 4 5 6 7 8 9 10 (base) houjinliang@3080server:~/userdoc/d2cv$ git config --global user.name 'hjl_3080server' (base) houjinliang@3080server:~/userdoc/d2cv$ git config --global user.email 'cosmicdustycn@outlook.com' (base) houjinliang@3080server:~/userdoc/d2cv$ ssh-keygen -t rsa -C "cosmicdustycn@outlook.com" Generating public/private rsa key pair. Enter file in which to save the key (/mnt/houjinliang/.ssh/id_rsa): Created directory '/mnt/houjinliang/.ssh' . Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /mnt/houjinliang/.ssh/id_rsa. Your public key has been saved in /mnt/houjinliang/.ssh/id_rsa.pub.
不需要担心Git的用户配置会对本服务器上的其他用户会产生影响。ssh-keygen
生产的的用户密钥会保存在个人账号的目录下。
1 2 3 4 5 6 7 8 9 (dlpy310pth113) houjinliang@3080server:~/.ssh$ pwd /mnt/houjinliang/.ssh (dlpy310pth113) houjinliang@3080server:~/.ssh$ ll 总用量 20 drwx------ 2 houjinliang houjinliang 4096 11月 1 10:19 ./ drwxr-xr-x 12 houjinliang houjinliang 4096 11月 1 10:17 ../ -rw------- 1 houjinliang houjinliang 1675 11月 1 10:17 id_rsa -rw-r--r-- 1 houjinliang houjinliang 407 11月 1 10:17 id_rsa.pub -rw-r--r-- 1 houjinliang houjinliang 444 11月 1 10:19 known_hosts
复制id_rsa.pub
文件下的内容,到Github的Setting中设置SSH Keys。如下。
1 2 ssh -T git@github.com Hi murphyhoucn! You've successfully authenticated, but GitHub does not provide shell access.
1 2 3 4 5 (base) houjinliang@3080server:~/userdoc$ git clone git@github.com:murphyhoucn/DeepLearningforCV.git (base) houjinliang@3080server:~/userdoc/DeepLearningforCV$ git status (base) houjinliang@3080server:~/userdoc/DeepLearningforCV$ git add . (base) houjinliang@3080server:~/userdoc/DeepLearningforCV$ git commit -m "add new file" (base) houjinliang@3080server:~/userdoc/DeepLearningforCV$ git push
查看GPU占用情况 nvidia-smi
gpustat
GitHub - wookayin/gpustat: 📊 A simple command-line utility for querying and monitoring GPU status
1 2 (dlpy310pth113) houjinliang@3080server:~/userdoc$ pip install gpustat (dlpy310pth113) houjinliang@3080server:~/userdoc$ gpustat
nvitop
GitHub - XuehaiPan/nvitop: An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
nvitop: 史上最强GPU性能实时监测工具 - 知乎 (zhihu.com)
1 2 3 4 5 6 (dlpy310pth113) houjinliang@3080server:~$ pip install nvitop Requirement already satisfied: nvitop in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (1.3.0) Requirement already satisfied: nvidia-ml-py<12.536.0a0,>=11.450.51 in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (from nvitop) (12.535.108) Requirement already satisfied: psutil>=5.6.6 in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (from nvitop) (5.9.5) Requirement already satisfied: cachetools>=1.0.1 in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (from nvitop) (5.3.1) Requirement already satisfied: termcolor>=1.0.0 in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (from nvitop) (2.3.0)
Clash for Linux
Ubuntu配置 命令行Clash 教程 - 知乎 (zhihu.com)
终端使用代理加速的正确方式(Clash) | Ln’s Blog (weilining.github.io)
2024.01.10
1 2 3 4 gunzip clash-linux-amd64-v1.18.0.gzmv clash-linux-amd64-v1.18.0 clashchmod u+x clash ./clash
1 在 ~/.config/clash/config.yaml 写入订阅的内容
1 2 3 4 5 6 7 8 9 10 11 `~/.bashrc`function proxy () {export http_proxy=http://127.0.0.1:7890export https_proxy=$http_proxy echo -e "proxy on!" }function unproxy (){unset http_proxy https_proxyecho -e "proxy off" }
1 2 3 4 5 (base) houjinliang@3080server:~/userdoc$ source ~/.bashrc (base) houjinliang@3080server:~/userdoc$ proxy proxy on! (base) houjinliang@3080server:~/userdoc$ unproxy proxy off
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 (base) houjinliang@3080server:~/userdoc$ wget www.zhihu.com URL transformed to HTTPS due to an HSTS policy --2024-01-10 15:33:59-- https://www.zhihu.com/ 正在连接 127.0.0.1:7890... 已连接。 已发出 Proxy 请求,正在等待回应... 302 Found 位置://www.zhihu.com/signin?next=%2F [跟随至新的 URL] URL transformed to HTTPS due to an HSTS policy --2024-01-10 15:33:59-- https://www.zhihu.com/signin?next=%2F 再次使用存在的到 www.zhihu.com:443 的连接。 已发出 Proxy 请求,正在等待回应... 200 OK 长度: 39879 (39K) [text/html] 正在保存至: “index.html” index.html 100%[===================================================================================================================>] 38.94K --.-KB/s 用时 0.04s 2024-01-10 15:33:59 (944 KB/s) - 已保存 “index.html” [39879/39879]) (base) houjinliang@3080server:~/userdoc$ wget www.google.com --2024-01-10 15:34:14-- http://www.google.com/ 正在连接 127.0.0.1:7890... 已连接。 已发出 Proxy 请求,正在等待回应... 200 OK 长度: 未指定 [text/html] 正在保存至: “index.html.1” index.html.1 [ <=> ] 18.72K --.-KB/s 用时 0.07s 2024-01-10 15:34:16 (257 KB/s) - “index.html.1” 已保存 [19169]
3080Server - MMDetection
Ubuntu 18.04.6 LTS
gcc version 7.5.0
CUDA 11.3
cuDNN 8.9.5
MMDetection
版本选择参考镜像:
open-mmlab/mmdetection3d/mmdetection3d-1.1: mmdetection3d-1.1版本 - CG (codewithgpu.com)
CUDA 11.3.1 & CUDNN 8.9.5 之前安装的是CUDA 11.6,后面感觉这个版本有点儿高了,在看到一些实例之后,决定退回到CUDA 11.3版本。首先第一步是要卸载掉CUDA 11.6,在搜索了之后,发现并没有找到能用的方法,于是决定直接rm -rf cuda-11.6
,这样吧CUDA的文件删掉之后再重装。
CUDA Toolkit 11.3 Update 1 Downloads | NVIDIA Developer
1 2 wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run sudo sh cuda_11.3.1_465.19.01_linux.run
非root用户安装cuda与cudnn - 知乎 (zhihu.com)
1 2 3 4 5 6 7 8 9 (base) houjinliang@3080server:~/userdoc/cuda_and_cudnn$ sh ./cuda_11.3.1_465.19.01_linux.run = Summary = Driver: Not Selected Toolkit: Installed in /mnt/houjinliang/cuda-11.3/ Samples: Not Selected Please make sure that PATH includes /mnt/houjinliang/cuda-11.3/bin LD_LIBRARY_PATH includes /mnt/houjinliang/cuda-11.3/lib64, or, add /mnt/houjinliang/cuda-11.3/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /mnt/houjinliang/cuda-11.3/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 465.00 is required for CUDA 11.3 functionality to work. To install the driver using this installer, run the following command , replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run --silent --driver Logfile is /tmp/cuda-installer.log
1 2 3 4 5 6 7 8 9 vim ~/.bashrc ```export CUDA_HOME=$CUDA_HOME :/mnt/houjinliang/cuda-11.3export PATH=$PATH :/mnt/houjinliang/cuda-11.3/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH :/mnt/houjinliang/cuda-11.3/lib64 ```
1 2 3 4 5 6 7 8 (base) houjinliang@3080server:~$ source ~/.bashrc (base) houjinliang@3080server:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 (base) houjinliang@3080server:~/userdoc/cuda_and_cudnn$ tar xvJf cudnn-linux-x86_64-8.9.5.29_cuda11-archive.tar.xz (py38mmdetection) houjinliang@3080server:~/userdoc/cuda_and_cudnn/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ ll 总用量 48 drwxr-xr-x 4 houjinliang houjinliang 4096 8月 3 2022 ./ drwxrwxr-x 3 houjinliang houjinliang 4096 1月 5 16:32 ../ drwxr-xr-x 2 houjinliang houjinliang 4096 8月 3 2022 include/ drwxr-xr-x 2 houjinliang houjinliang 4096 8月 3 2022 lib/ -rw-r--r-- 1 houjinliang houjinliang 28994 8月 3 2022 LICENSE (py38mmdetection) houjinliang@3080server:~/userdoc/cuda_and_cudnn/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ cp lib/* ~/cuda-11.3/lib64/ (py38mmdetection) houjinliang@3080server:~/userdoc/cuda_and_cudnn/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ cp include/* ~/cuda-11.3/includechmod +x ~/cuda-11.3/include/cudnn.hchmod +x ~/cuda-11.3/lib64/libcudnn* (base) houjinliang@3080server:~$ cat ~/cuda-11.3/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 -- /* cannot use constexpr here since this is a C-only file */
PyTorch 1.11 1 2 3 4 5 6 7 8 9 10 11 (base) houjinliang@3080server:~$ conda create -n py38mmdetection python=3.8 -y (base) houjinliang@3080server:~$ conda activate py38mmdetection (py38mmdetection) houjinliang@3080server:~$ conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch (py38mmdetection) houjinliang@3080server:~$ python Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help" , "copyright" , "credits" or "license" for more information. >>> import torch >>> print (torch.cuda.is_available()) True
阿里云源 1 pip config set global.index-url https://mirrors.aliyun.com/pypi/simple
mmdet installation 开始你的第一步 — MMDetection 3.3.0 文档
3080Server - MMYOLO Overview — MMYOLO 0.6.0 documentation
1 2 3 4 5 6 7 8 9 10 11 (base) houjinliang@3080server:~$ conda create -n py38mmyolo python=3.8 (base) houjinliang@3080server:~$ conda activate py38mmyolo (py38mmyolo) houjinliang@3080server:~$ pip config list global.index-url='https://mirrors.aliyun.com/pypi/simple' (py38mmyolo) houjinliang@3080server:~$ conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch (py38mmyolo) houjinliang@3080server:~$ python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())" 1.11.0 True
1 2 3 4 pip install -U openmim mim install "mmengine>=0.6.0" mim install "mmcv>=2.0.0rc4,<2.1.0" mim install "mmdet>=3.0.0,<4.0.0"
1 2 3 4 5 6 7 8 9 git clone https://github.com/open-mmlab/mmyolo.gitcd mmyolo pip install -r requirements/albu.txt mim install -v -e .
1 2 3 4 5 6 7 8 9 10 (base) houjinliang@3080server:~/userdoc/offlinefile$ wget http://images.cocodataset.org/zips/val2017.zip --2024-01-10 16:17:46-- http://images.cocodataset.org/zips/val2017.zip 正在解析主机 images.cocodataset.org (images.cocodataset.org)... 3.5.7.141, 52.216.215.25, 52.216.185.83, ... 正在连接 images.cocodataset.org (images.cocodataset.org)|3.5.7.141|:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度: 815585330 (778M) [application/zip] 正在保存至: “val2017.zip” val2017.zip 100%[===================================================================================================================>] 777.80M 3.89MB/s 用时 2m 22ss 2024-01-10 16:20:08 (5.48 MB/s) - 已保存 “val2017.zip” [815585330/815585330])
目录占用空间大小查询
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 (py38mmyolo) houjinliang@3080server:~/userdoc/offlinefile$ ll 总用量 26251480 drwxrwxr-x 6 houjinliang houjinliang 4096 1月 10 21:36 ./ drwxrwxr-x 9 houjinliang houjinliang 4096 1月 10 15:59 ../ -rw-rw-r-- 1 houjinliang houjinliang 3996930 1月 10 14:43 clash-linux-amd64-v1.18.0.gz drwxr-xr-x 5 houjinliang houjinliang 4096 8月 26 2022 coco/ -rw-rw-r-- 1 houjinliang houjinliang 6983030 1月 10 17:00 coco128.zip -rw-rw-r-- 1 houjinliang houjinliang 48639045 1月 10 16:21 coco2017labels.zip -rw-rw-r-- 1 houjinliang houjinliang 4372979 1月 10 14:48 curl-8.5.0.tar.gz -rw-rw-r-- 1 houjinliang houjinliang 12353723 1月 5 16:32 pandas-2.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl drwxrwxr-x 2 houjinliang houjinliang 1429504 8月 31 2017 test2017/ -rw-rw-r-- 1 houjinliang houjinliang 6646970404 1月 10 17:47 test2017.zip drwxrwxr-x 2 houjinliang houjinliang 4112384 8月 31 2017 train2017/ -rw-rw-r-- 1 houjinliang houjinliang 19336861798 1月 10 21:35 train2017.zip drwxrwxr-x 2 houjinliang houjinliang 167936 8月 31 2017 val2017/ -rw-rw-r-- 1 houjinliang houjinliang 815585330 7月 11 2018 val2017.zip (py38mmyolo) houjinliang@3080server:~/userdoc/offlinefile$ ll -hl 总用量 26G drwxrwxr-x 6 houjinliang houjinliang 4.0K 1月 10 21:36 ./ drwxrwxr-x 9 houjinliang houjinliang 4.0K 1月 10 15:59 ../ -rw-rw-r-- 1 houjinliang houjinliang 3.9M 1月 10 14:43 clash-linux-amd64-v1.18.0.gz drwxr-xr-x 5 houjinliang houjinliang 4.0K 8月 26 2022 coco/ -rw-rw-r-- 1 houjinliang houjinliang 6.7M 1月 10 17:00 coco128.zip -rw-rw-r-- 1 houjinliang houjinliang 47M 1月 10 16:21 coco2017labels.zip -rw-rw-r-- 1 houjinliang houjinliang 4.2M 1月 10 14:48 curl-8.5.0.tar.gz -rw-rw-r-- 1 houjinliang houjinliang 12M 1月 5 16:32 pandas-2.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl drwxrwxr-x 2 houjinliang houjinliang 1.4M 8月 31 2017 test2017/ -rw-rw-r-- 1 houjinliang houjinliang 6.2G 1月 10 17:47 test2017.zip drwxrwxr-x 2 houjinliang houjinliang 4.0M 8月 31 2017 train2017/ -rw-rw-r-- 1 houjinliang houjinliang 19G 1月 10 21:35 train2017.zip drwxrwxr-x 2 houjinliang houjinliang 164K 8月 31 2017 val2017/ -rw-rw-r-- 1 houjinliang houjinliang 778M 7月 11 2018 val2017.zip
如要查看当前目录已经使用总大小及当前目录下一级文件或文件夹各自使用的总空间大小
1 2 3 4 5 6 7 8 9 10 11 12 (py38mmyolo) houjinliang@3080server:~$ du -h --max-depth=1 6.5M ./.config 8.0K ./.conda 1.1G ./.vscode-server 12G ./cuda-11.3 86G ./userdoc 8.0K ./.gnupg 16K ./.ssh 8.0K ./.nv 2.7G ./.cache 24G ./miniconda3 125G .
3090Server
Ubuntu 18.04.6 LTS
gcc version 7.5.0
CUDA 11.3
cuDNN 8.9.5
系统详细 1 2 3 Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-150-generic x86_64) Model name: Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
NV Driver 1 2 3 (base) houjinliang@3090server:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 515.65.01 Wed Jul 20 14:00:58 UTC 2022 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
个人目录 1 2 3 4 5 6 7 8 9 10 houjinliang@3090server:~$ ll total 40 drwxr-xr-x 4 houjinliang houjinliang 4096 6月 26 10:18 ./ drwxrwxrwx 21 super super 4096 6月 26 10:17 ../ -rw-r--r-- 1 houjinliang houjinliang 220 4月 5 2018 .bash_logout -rw-r--r-- 1 houjinliang houjinliang 3771 4月 5 2018 .bashrc drwx------ 2 houjinliang houjinliang 4096 6月 26 10:18 .cache/ -rw-r--r-- 1 houjinliang houjinliang 8980 4月 16 2018 examples.desktop drwx------ 3 houjinliang houjinliang 4096 6月 26 10:18 .gnupg/ -rw-r--r-- 1 houjinliang houjinliang 807 4月 5 2018 .profile
Miniconda 下载Miniconda的sh脚本文件,增加文件可执行的权限,然后执行下载脚本.
1 2 3 houjinliang@3090server:~/MyDownloadFiles$ wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh houjinliang@3090server:~/MyDownloadFiles$ chmod +x Miniconda3-latest-Linux-x86_64.sh houjinliang@3090server:~/MyDownloadFiles$ ./Miniconda3-latest-Linux-x86_64.sh
安装过程中会有选择安装路径的选择,直接选择默认路径.
1 2 3 4 5 6 7 8 9 Miniconda3 will now be installed into this location: /mnt/houjinliang/miniconda3 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/mnt/houjinliang/miniconda3] >>>
这里选择输入yes
,然后会自动配置 ~/.bashrc
,关闭Terminal然后再重启一个,就能看到命令行前面的base
了;
如果是输入no
的话,手动输入下面的内容到 ~/.bashrc
中。
安装完成之后conda命令在终端是识别不到的,需要配置环境变量.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 (base) houjinliang@3090server:~$ vim ~/.bashrc __conda_setup="$('/mnt/houjinliang/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null) " if [ $? -eq 0 ]; then eval "$__conda_setup " else if [ -f "/mnt/houjinliang/miniconda3/etc/profile.d/conda.sh" ]; then . "/mnt/houjinliang/miniconda3/etc/profile.d/conda.sh" else export PATH="/mnt/houjinliang/miniconda3/bin:$PATH " fi fi unset __conda_setup houjinliang@3090server:~$ source ~/.bashrc (base) houjinliang@3090server:~$
检查一下Minconda的基本信息.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 (base) houjinliang@3090server:~$ conda info active environment : base active env location : /mnt/houjinliang/miniconda3 shell level : 1 user config file : /mnt/houjinliang/.condarc populated config files : conda version : 24.4.0 conda-build version : not installed python version : 3.12.3.final.0 solver : libmamba (default) virtual packages : __archspec=1=broadwell __conda=24.4.0=0 __cuda=11.7=0 __glibc=2.27=0 __linux=5.4.0=0 __unix=0=0 base environment : /mnt/houjinliang/miniconda3 (writable) conda av data dir : /mnt/houjinliang/miniconda3/etc/conda conda av metadata url : None channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /mnt/houjinliang/miniconda3/pkgs /mnt/houjinliang/.conda/pkgs envs directories : /mnt/houjinliang/miniconda3/envs /mnt/houjinliang/.conda/envs platform : linux-64 user-agent : conda/24.4.0 requests/2.31.0 CPython/3.12.3 Linux/5.4.0-150-generic ubuntu/18.04.6 glibc/2.27 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8 aau/0.4.4 c/. s/. e/. UID:GID : 1035:1035 netrc file : None offline mode : False
conda换源,换成阿里云源
1 2 3 参考: https://developer.aliyun.com/article/1291651
pip换源,换成阿里云源
直接用命令的方式,如下.
1 2 (base) houjinliang@3090server:~$ pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ Writing to /mnt/houjinliang/.config/pip/pip.conf
或者是修改 ~/.config/pip/pip.conf (没有就创建一个), 内容如下:
1 2 3 (base) houjinliang@3090server:~$ cat ~/.config/pip/pip.conf [global] index-url = https://mirrors.aliyun.com/pypi/simple/
NV Driver 1 2 3 (base) houjinliang@3080server:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.60.11 Wed Nov 23 23:04:03 UTC 2022 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
CUDA 11.3.1 & CUDNN 8.9.5 跟之前的服务器CUDA版本一样,这里还是参照上面的进行安装.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 (base) houjinliang@3090server:~/MyDownloadFiles$ wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run (base) houjinliang@3090server:~/MyDownloadFiles$ ll total 3224920 drwxrwxr-x 2 houjinliang houjinliang 4096 6月 26 11:10 ./ drwxr-xr-x 10 houjinliang houjinliang 4096 6月 26 11:04 ../ -rw-rw-r-- 1 houjinliang houjinliang 3158494112 5月 14 2021 cuda_11.3.1_465.19.01_linux.run -rwxrwxr-x 1 houjinliang houjinliang 143808873 5月 21 02:15 Miniconda3-latest-Linux-x86_64.sh* (base) houjinliang@3090server:~/MyDownloadFiles$ chmod +x cuda_11.3.1_465.19.01_linux.run (base) houjinliang@3090server:~/MyDownloadFiles$ ll total 3224920 drwxrwxr-x 2 houjinliang houjinliang 4096 6月 26 11:10 ./ drwxr-xr-x 10 houjinliang houjinliang 4096 6月 26 11:04 ../ -rwxrwxr-x 1 houjinliang houjinliang 3158494112 5月 14 2021 cuda_11.3.1_465.19.01_linux.run* -rwxrwxr-x 1 houjinliang houjinliang 143808873 5月 21 02:15 Miniconda3-latest-Linux-x86_64.sh* (base) houjinliang@3090server:~/MyDownloadFiles$ ./cuda_11.3.1_465.19.01_linux.run
出现这样的不要害怕,直接Continue
就好了,然后按照下面的步骤。
安装完成
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (base) houjinliang@3090server:~/MyDownloadFiles$ ./cuda_11.3.1_465.19.01_linux.run =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /mnt/houjinliang/cuda-11.3/ Samples: Not Selected Please make sure that - PATH includes /mnt/houjinliang/cuda-11.3/bin - LD_LIBRARY_PATH includes /mnt/houjinliang/cuda-11.3/lib64, or, add /mnt/houjinliang/cuda-11.3/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /mnt/houjinliang/cuda-11.3/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 465.00 is required for CUDA 11.3 functionality to work. To install the driver using this installer, run the following command , replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run --silent --driver Logfile is /tmp/cuda-installer.log
安装完成之后,最好把这个/tmp/cuda-installer.log
文件删除了,如果不删的话,后面的用户再安装就会有影响。为了不妨碍他人,最好把这个删掉。
配置CUDA Toolkit 的环境变量,使用vim或vscode
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (base) houjinliang@3090server:~$ vim ~/.bashrcexport CUDA_HOME=$CUDA_HOME :/mnt/houjinliang/cuda-11.3export PATH=$PATH :/mnt/houjinliang/cuda-11.3/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH :/mnt/houjinliang/cuda-11.3/lib64 (base) houjinliang@3090server:~$ source ~/.bashrc (base) houjinliang@3090server:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0
cudann安装。cudnn的下载需要到nVidia的网站,登录账号才行,这里我就直接用之前安装的时候已经下载好的了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 (base) houjinliang@3090server:~/MyDownloadFiles$ ll total 4062292 drwxrwxr-x 2 houjinliang houjinliang 4096 6月 26 11:56 ./ drwxr-xr-x 11 houjinliang houjinliang 4096 6月 26 11:48 ../ -rwxrwxr-x 1 houjinliang houjinliang 3158494112 5月 14 2021 cuda_11.3.1_465.19.01_linux.run* -rw-rw-r-- 1 houjinliang houjinliang 857460936 6月 26 11:57 cudnn-linux-x86_64-8.9.5.29_cuda11-archive.tar.xz -rwxrwxr-x 1 houjinliang houjinliang 143808873 5月 21 02:15 Miniconda3-latest-Linux-x86_64.sh* (base) houjinliang@3090server:~/MyDownloadFiles$ tar xvJf cudnn-linux-x86_64-8.9.5.29_cuda11-archive.tar.xz cudnn-linux-x86_64-8.9.5.29_cuda11-archive/ cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/ cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_infer_static.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_infer_static_v8.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_train_static.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_train_static_v8.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_infer_static.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_infer_static_v8.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_train_static.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_train_static_v8.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_infer_static.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_infer_static_v8.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_train_static.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_train_static_v8.a cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn.so.8 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn.so cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn.so.8.9.5 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_infer.so cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_infer.so.8.9.5 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_infer.so.8 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_train.so.8.9.5 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_train.so.8 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_adv_train.so cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_infer.so.8 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_infer.so cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_infer.so.8.9.5 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_train.so.8.9.5 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_train.so.8 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_cnn_train.so cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_infer.so cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_infer.so.8 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_infer.so.8.9.5 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_train.so cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_train.so.8 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/lib/libcudnn_ops_train.so.8.9.5 cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/ cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_adv_infer_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_adv_train_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_backend_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_cnn_infer_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_cnn_train_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_ops_infer_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_ops_train_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_version_v8.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_adv_infer.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_adv_train.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_backend.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_cnn_infer.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_cnn_train.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_ops_infer.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_ops_train.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/include/cudnn_version.h cudnn-linux-x86_64-8.9.5.29_cuda11-archive/LICENSE (base) houjinliang@3090server:~/MyDownloadFiles$ cd cudnn-linux-x86_64-8.9.5.29_cuda11-archive/ (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ ll total 48 drwxr-xr-x 4 houjinliang houjinliang 4096 9月 7 2023 ./ drwxrwxr-x 3 houjinliang houjinliang 4096 6月 26 11:58 ../ drwxr-xr-x 2 houjinliang houjinliang 4096 9月 7 2023 include/ drwxr-xr-x 2 houjinliang houjinliang 4096 9月 7 2023 lib/ -rw-r--r-- 1 houjinliang houjinliang 29662 9月 7 2023 LICENSE (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ cp lib/* ~/cuda-11.3/lib64/ (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ cp include/* ~/cuda-11.3/include (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ chmod +x ~/cuda-11.3/include/cudnn.h (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ chmod +x ~/cuda-11.3/lib64/libcudnn* (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.5.29_cuda11-archive$ cat ~/cuda-11.3/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 -- /* cannot use constexpr here since this is a C-only file */
Git & Github 1 2 3 4 5 6 7 8 9 10 (base) houjinliang@3090server:~$ git config --global user.name 'hjl_3090server' (base) houjinliang@3090server:~$ git config --global user.email 'cosmicdustycn@outlook.com' (base) houjinliang@3090server:~$ ssh-keygen -t rsa -C "cosmicdustycn@outlook.com" Generating public/private rsa key pair. Enter file in which to save the key (/mnt/houjinliang/.ssh/id_rsa): Created directory '/mnt/houjinliang/.ssh' . Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /mnt/houjinliang/.ssh/id_rsa. Your public key has been saved in /mnt/houjinliang/.ssh/id_rsa.pub.
1 2 3 4 5 6 7 8 (base) houjinliang@3090server:~/.ssh$ pwd /mnt/houjinliang/.ssh (base) houjinliang@3090server:~/.ssh$ ll total 16 drwx------ 2 houjinliang houjinliang 4096 6月 26 12:11 ./ drwxr-xr-x 12 houjinliang houjinliang 4096 6月 26 12:11 ../ -rw------- 1 houjinliang houjinliang 1679 6月 26 12:11 id_rsa -rw-r--r-- 1 houjinliang houjinliang 407 6月 26 12:11 id_rsa.pub
1 2 3 4 5 6 7 8 9 10 (base) houjinliang@3090server:~$ git config user.name hjl_3090server (base) houjinliang@3090server:~$ git config user.email cosmicdustycn@outlook.com (base) houjinliang@3090server:~$ ssh -T git@github.com The authenticity of host 'github.com (20.205.243.166)' can't be established. ECDSA key fingerprint is xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added ' github.com,20.205.243.166' (ECDSA) to the list of known hosts. Hi murphyhoucn! You' ve successfully authenticated, but GitHub does not provide shell access.
3090Server2
Ubuntu 20.04.5 LTS
gcc version 9.4.0
CUDA 11.3
cuDNN 8.9.5
NV Driver 1 2 3 (base) houjinliang@3090server2:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.183.01 Sun May 12 19:39:15 UTC 2024 GCC version: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)
4090Server
Ubuntu 22.04.2 LTS
gcc 11.4.0
CUDA11.6 : cuda_11.6.2_510.47.03_linux.run
cuDNN 8.9.5: cudnn-linux-x86_64-8.9.5.29_cuda11-archive.tar.xz
NV Driver 1 2 3 (sr_benchmark) houjinliang@4090server:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.183.06 Wed Jun 26 06:46:07 UTC 2024 GCC version: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
CUDA 11.6 & cuDNN 8.9.5 1 2 3 4 5 (base) houjinliang@4090server:~/MyDownloadFiles$ ./cuda_11.6.2_510.47.03_linux.run (base) houjinliang@3090server:~/MyDownloadFiles$ cd cudnn-linux-x86_64-8.9.5.29_cuda11-archive/
安装过程跟上面的一样,记得把11.3都换成11.6
之后再配置Git。
至于conda env,我把之前服务器上的环境使用conda-pack打包,然后使用scp传过来,然后解压到对应文件夹下。虽然之前cuda113,torch也是113版本的,但是在cuda116的服务器上也能用(那就先用着?!
问题:Failed to initialize NVML: Driver/library version mismatch 环境正常运行了很长一段时间,但是突然有一天,在运行程序的时候出现了这样一个报错!
1 ERROR: cuda is not available, try running on CPU
这个error是我自己的程序里写得报错提示,系统的cuda不可用了?!这是咋回事?!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 (base) houjinliang@4090server:~$ nvidia-smi Failed to initialize NVML: Driver/library version mismatch NVML library version: 535.216 (base) houjinliang@4090server:~$ nvitop NVML ERROR: RM has detected an NVML/RM version mismatch. (base) houjinliang@4090server:~$ gpustat Error on querying NVIDIA devices. Use --debug flag to see more details. RM has detected an NVML/RM version mismatch. (base) houjinliang@4090server:~$ gpustat --debug Error on querying NVIDIA devices. Use --debug flag to see more details. RM has detected an NVML/RM version mismatch. Traceback (most recent call last): File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/gpustat/cli.py" , line 58, in print_gpustat gpu_stats = GPUStatCollection.new_query(debug=debug, id =id ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/gpustat/core.py" , line 402, in new_query N.nvmlInit() File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/pynvml.py" , line 1947, in nvmlInit nvmlInitWithFlags(0) File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/pynvml.py" , line 1937, in nvmlInitWithFlags _nvmlCheckReturn(ret) File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/pynvml.py" , line 899, in _nvmlCheckReturn raise NVMLError(ret) pynvml.NVMLError_LibRmVersionMismatch: RM has detected an NVML/RM version mismatch. (sr_benchmark) houjinliang@4090server:~$ python Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help" , "copyright" , "credits" or "license" for more information. >>> import torch >>> print (torch.cuda.is_available()) /mnt/houjinliang/miniconda3/envs/sr_benchmark/lib/python3.8/site-packages/torch/cuda/__init__.py:80: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.) return torch._C._cuda_getDeviceCount() > 0 False
Failed to initialize NVML: Driver/library version mismatch 的解决方法 - 知乎
4090Server2
Ubuntu 22.04.3 LTS
gcc version 12.3.0
NV Driver 1 2 3 (base) houjinliang@4090server2:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 550.107.02 Wed Jul 24 23:53:00 UTC 2024 GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
CUDA 12.4.1 & cuDNN 8.9.7 CUDA 12.4.1 : CUDA Toolkit 12.4 Update 1 Downloads | NVIDIA Developer
1 (base) houjinliang@4090server2:~/MyDownloadFiles$ wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
记得把这个log文件删掉!
配置CUDA的环境变量
1 (base) houjinliang@4090server2:~/MyDownloadFiles$ vim ~/.bashrc
1 2 3 4 5 6 # >>> cuda environment variables >>> # murpy insert export CUDA_HOME=$CUDA_HOME:/data/houjinliang/cuda-12.4 export PATH=$PATH:/data/houjinliang/cuda-12.4/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/houjinliang/cuda-12.4/lib64 # <<< cuda environment variables <<<
1 2 3 4 5 6 7 (base) houjinliang@4090server2:~/MyDownloadFiles$ source ~/.bashrc (base) houjinliang@4090server2:~/MyDownloadFiles$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0
CUDNN : cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar
https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 (base) houjinliang@4090server2:~/MyDownloadFiles$ tar xvJf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz (base) houjinliang@4090server2:~/MyDownloadFiles$ cd cudnn-linux-x86_64-8.9.7.29_cuda12-archive/ (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.7.29_cuda12-archive$ ll total 48 drwxr-xr-x 4 houjinliang houjinliang 4096 11月 30 2023 ./ drwxrwxr-x 3 houjinliang houjinliang 4096 10月 24 22:53 ../ drwxr-xr-x 2 houjinliang houjinliang 4096 11月 30 2023 include/ drwxr-xr-x 2 houjinliang houjinliang 4096 11月 30 2023 lib/ -rw-r--r-- 1 houjinliang houjinliang 29662 11月 30 2023 LICENSE (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.7.29_cuda12-archive$ cp lib/* ~/cuda-12.4/lib64/ (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.7.29_cuda12-archive$ cp include/* ~/cuda-12.4/include (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.7.29_cuda12-archive$ chmod +x ~/cuda-12.4/include/cudnn.h (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.7.29_cuda12-archive$ chmod +x ~/cuda-12.4/lib64/libcudnn* (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-8.9.7.29_cuda12-archive$ cat ~/cuda-12.4/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 -- /* cannot use constexpr here since this is a C-only file */
git install 这台服务器上没有git,使用deb包安装一个
1 (base) houjinliang@4090server2:~/MyDownloadFiles$ wget http://archive.ubuntu.com/ubuntu/pool/main/g/git/git_2.34.1-1ubuntu1.11_amd64.deb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 (base) houjinliang@4090server2:~/MyDownloadFiles$ cd ~ (base) houjinliang@4090server2:~$ mkdir git (base) houjinliang@4090server2:~$ dpkg -x ./MyDownloadFiles/git_2.34.1-1ubuntu1.11_amd64.deb ./git (base) houjinliang@4090server2:~$ cd git/ (base) houjinliang@4090server2:~/git$ ll total 20 drwxr-xr-x 5 houjinliang houjinliang 4096 5月 20 20:14 ./ drwxr-x--- 14 houjinliang houjinliang 4096 10月 24 23:22 ../ drwxr-xr-x 3 houjinliang houjinliang 4096 5月 20 20:14 etc/ drwxr-xr-x 5 houjinliang houjinliang 4096 5月 20 20:14 usr/ drwxr-xr-x 3 houjinliang houjinliang 4096 5月 20 20:14 var/
1 (base) houjinliang@4090server2:~$ vim ~/.bashrc
1 2 3 4 5 export PATH=$PATH :~/git/usr/binexport GIT_EXEC_PATH=~/git/usr/lib/git-core
1 (base) houjinliang@4090server2:~$ source ~/.bashrc
1 2 (base) houjinliang@4090server2:~$ git --version git version 2.34.1
git 配置 1 2 3 4 5 6 7 8 9 10 11 (base) houjinliang@4090server2:~$ git config --global user.name 'hjl_4090server2' (base) houjinliang@4090server2:~$ git config --global user.email 'cosmicdustycn@outlook.com' (base) houjinliang@4090server2:~$ ssh-keygen -t rsa -C "cosmicdustycn@outlook.com" Generating public/private rsa key pair. Enter file in which to save the key (/data/houjinliang/.ssh/id_rsa): Created directory '/data/houjinliang/.ssh' . Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /data/houjinliang/.ssh/id_rsa Your public key has been saved in /data/houjinliang/.ssh/id_rsa.pub (base) houjinliang@4090server2:~$ cat ~/.ssh/id_rsa.pub
1 2 3 4 5 6 7 8 9 10 11 (base) houjinliang@4090server2:~$ git config user.name hjl_4090server2 (base) houjinliang@4090server2:~$ git config user.email cosmicdustycn@outlook.com (base) houjinliang@4090server2:~$ ssh -T git@github.com The authenticity of host 'github.com (20.205.243.166)' can't be established. ED25519 key fingerprint is SHA256:+DiY3wvvV6TuJJhbpZisF/zLDA0zPMSvHdkr4UvCOqU. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added ' github.com' (ED25519) to the list of known hosts. Hi murphyhoucn! You' ve successfully authenticated, but GitHub does not provide shell access.
conda env 虽然4090server2上面的CUDA环境是12.4,但这里还是用了在3080上配置的sr_benchmark的环境。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 (base) houjinliang@4090server2:~$ mkdir ~/miniconda3/envs/sr_benchmark (base) houjinliang@4090server2:~$ tar -xzvf ./MyDownloadFiles/sr_benchmark.tar.gz -C ~/miniconda3/envs/sr_benchmark (base) houjinliang@4090server2:~$ conda env list base * /data/houjinliang/miniconda3 sr_benchmark /data/houjinliang/miniconda3/envs/sr_benchmark (base) houjinliang@4090server2:~$ (base) houjinliang@4090server2:~$ conda activate sr_benchmark (sr_benchmark) houjinliang@4090server2:~$ python Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help" , "copyright" , "credits" or "license" for more information. >>> import torch >>> print (torch.cuda.is_available()) True >>> torch 1.10.1+cu113 torchvision 0.11.2+cu113
参考链接 CUDA 12.6 Update 2 Release Notes
GCC与CUDA版本对应
3080Server - gcc 7.5.0 (Ubuntu 18.04.6 LTS)-> CUDA 11.3
3090Server - gcc 7.5.0 (Ubuntu 18.04.6 LTS)-> CUDA 11.3
3090Server2 - gcc 9.4.0 (Ubuntu 20.04.5 LTS)-> CUDA 11.3
4090Server - gcc 11.4.0 (Ubuntu 22.04.2 LTS)-> CUDA 11.6
4090Server - gcc 12.3.0 (Ubuntu 22.04.3 LTS)-> CUDA 12.4
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
cuDNN docs
CUDA Toolkit Archive | NVIDIA Developer
cuDNN Archive
Docker
Docker Install
需要管理员用户!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 sudo apt update sudo apt install \ apt-transport-https \ ca-certificates \ curl \ gnupg \ lsb-release curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpgecho \ "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://mirrors.aliyun.com/docker-ce/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt update sudo apt install docker-ce docker-ce-cli containerd.io sudo systemctl enable docker sudo systemctl start docker
为了让非管理员用户也能使用docker,需要建立用户组,赋予用户组内的用户权限
1 2 3 4 5 6 7 8 9 10 11 12 sudo groupadd docker sudo usermod -aG docker $USER sudo usermod -aG docker xxxxxxxx getent group docker grep '^docker:' /etc/group
配置docker代理
docker 代理配置需要管理员用户 !
上网代理,参考瞧瞧我对服务器干了些什么! - MurphyHou (cosmicdusty.cc)
一、配置镜像服务器(很多镜像服务器已经不能用了)
1 2 3 4 5 6 7 8 9 10 11 12 13 vim /etc/docker/daemon.json { "registry-mirrors" : [ "https://hub-mirror.c.163.com" , "https://mirror.baidubce.com" ] } sudo systemctl daemon-reload sudo systemctl restart docker
二、docker pull代理
1 2 3 4 5 6 7 8 9 10 11 12 sudo mkdir -p /etc/systemd/system/docker.service.d sudo touch /etc/systemd/system/docker.service.d/proxy.conf [Service] Environment="HTTP_PROXY=http://127.0.0.1:7890/" Environment="HTTPS_PROXY=http://127.0.0.1:7890/" Environment="NO_PROXY=localhost,127.0.0.1,.example.com" sudo systemctl daemon-reload sudo systemctl restart docker
三、Container代理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 vim ~/.docker/config.json { "proxies" : { "default" : { "httpProxy" : "http://127.0.0.1:7890" , "httpsProxy" : "http://127.0.0.1:7890" , "noProxy" : "localhost,127.0.0.1,.example.com" } } }
测试Docker配置是否成功
Ubuntu | Docker — 从入门到实践 (gitbook.io)
1 docker run --rm hello-world
配置overleaf
上述的docker环境配置好之后,可以配置一下overleaf. 特别是得配置好网络环境,要不然Docker Image拉取不下来
配置 1 2 3 4 5 6 7 8 git clone https://github.com/overleaf/toolkit.git ./overleaf-toolkit && cd overleaf-toolkit bin/init bin/up
远程访问 因为服务是在远程服务器上,为了在本地能直接方法,需要修改端口和外网访问
在./config/overleaf.rc
中,需要修改以下字段:
1 2 OVERLEAF_LISTEN_IP=xx.xx.xx.xx # 远程服务器IP OVERLEAF_PORT=80 # 默认是80
Overleaf 容器启动之后,可以打开 http://xx.xx.xx.xx:xx/launchpad 注册管理员帐户。之后我们就可以用这个帐户登录 Overleaf 平台。
网上教程中还给出了一些复杂的配置,后面根据需要再配置吧。
后记 因为Overleaf官网对于免费用户,只有20s的编译时间,超过时间限制则无法编译。对于这种情况,只能付费解决。如果面对我遇到这样的情况的话,我可能也会选择付费的方式。但在网上看到了可以在服务器上搭建自己的Overleaf,所以想跟着教程自己试一下。按照教程一步步走下来,最后也配置成功了。也许最后并不会使用自己配置的这个,但折腾永不停息,万一用到了呢?!