1 2 git config --global "SSH keys Name" git config --global "SSH keys Email"
1 ssh-keygen -t rsa -C "Email of Github Account"
1 2 3 4 5 6 7 8 9 10 (base) houjinliang@3080server:~/userdoc/d2cv$ git config --global 'hjl_3080server' (base) houjinliang@3080server:~/userdoc/d2cv$ git config --global '' (base) houjinliang@3080server:~/userdoc/d2cv$ ssh-keygen -t rsa -C "" Generating public/private rsa key pair. Enter file in which to save the key (/mnt/houjinliang/.ssh/id_rsa): Created directory '/mnt/houjinliang/.ssh' . Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /mnt/houjinliang/.ssh/id_rsa. Your public key has been saved in /mnt/houjinliang/.ssh/
1 2 3 4 5 6 7 8 9 (dlpy310pth113) houjinliang@3080server:~/.ssh$ pwd /mnt/houjinliang/.ssh (dlpy310pth113) houjinliang@3080server:~/.ssh$ ll 总用量 20 drwx------ 2 houjinliang houjinliang 4096 11月 1 10:19 ./ drwxr-xr-x 12 houjinliang houjinliang 4096 11月 1 10:17 ../ -rw------- 1 houjinliang houjinliang 1675 11月 1 10:17 id_rsa -rw-r--r-- 1 houjinliang houjinliang 407 11月 1 10:17 -rw-r--r-- 1 houjinliang houjinliang 444 11月 1 10:19 known_hosts
文件下的内容,到Github的Setting中设置SSH Keys。如下。
1 2 ssh -T Hi murphyhoucn! You've successfully authenticated, but GitHub does not provide shell access.
1 2 3 4 5 (base) houjinliang@3080server:~/userdoc$ git clone (base) houjinliang@3080server:~/userdoc/DeepLearningforCV$ git status (base) houjinliang@3080server:~/userdoc/DeepLearningforCV$ git add . (base) houjinliang@3080server:~/userdoc/DeepLearningforCV$ git commit -m "add new file" (base) houjinliang@3080server:~/userdoc/DeepLearningforCV$ git push
查看GPU占用情况 nvidia-smi
1 2 (dlpy310pth113) houjinliang@3080server:~/userdoc$ pip install gpustat (dlpy310pth113) houjinliang@3080server:~/userdoc$ gpustat
1 2 3 4 5 6 (dlpy310pth113) houjinliang@3080server:~$ pip install nvitop Requirement already satisfied: nvitop in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (1.3.0) Requirement already satisfied: nvidia-ml-py<12.536.0a0,>=11.450.51 in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (from nvitop) (12.535.108) Requirement already satisfied: psutil>=5.6.6 in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (from nvitop) (5.9.5) Requirement already satisfied: cachetools>=1.0.1 in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (from nvitop) (5.3.1) Requirement already satisfied: termcolor>=1.0.0 in ./miniconda3/envs/dlpy310pth113/lib/python3.10/site-packages (from nvitop) (2.3.0)
Clash for Linux
1 2 3 4 gunzip clash-linux-amd64-v1.18.0.gzmv clash-linux-amd64-v1.18.0 clashchmod u+x clash ./clash
1 在 ~/.config/clash/config.yaml 写入订阅的内容
1 2 3 4 5 6 7 8 9 10 11 `~/.bashrc`function proxy () {export http_proxy= https_proxy=$http_proxy echo -e "proxy on!" }function unproxy (){unset http_proxy https_proxyecho -e "proxy off" }
1 2 3 4 5 (base) houjinliang@3080server:~/userdoc$ source ~/.bashrc (base) houjinliang@3080server:~/userdoc$ proxy proxy on! (base) houjinliang@3080server:~/userdoc$ unproxy proxy off
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 (base) houjinliang@3080server:~/userdoc$ wget URL transformed to HTTPS due to an HSTS policy --2024-01-10 15:33:59-- 正在连接 已连接。 已发出 Proxy 请求,正在等待回应... 302 Found 位置:// [跟随至新的 URL] URL transformed to HTTPS due to an HSTS policy --2024-01-10 15:33:59-- 再次使用存在的到 的连接。 已发出 Proxy 请求,正在等待回应... 200 OK 长度: 39879 (39K) [text/html] 正在保存至: “index.html” index.html 100%[===================================================================================================================>] 38.94K --.-KB/s 用时 0.04s 2024-01-10 15:33:59 (944 KB/s) - 已保存 “index.html” [39879/39879]) (base) houjinliang@3080server:~/userdoc$ wget --2024-01-10 15:34:14-- 正在连接 已连接。 已发出 Proxy 请求,正在等待回应... 200 OK 长度: 未指定 [text/html] 正在保存至: “index.html.1” index.html.1 [ <=> ] 18.72K --.-KB/s 用时 0.07s 2024-01-10 15:34:16 (257 KB/s) - “index.html.1” 已保存 [19169]
3080Server - MMDetection
Ubuntu 18.04.6 LTS
gcc version 7.5.0
CUDA 11.3
cuDNN 8.9.5
CUDA 11.3.1 & CUDNN 8.9.5 之前安装的是CUDA 11.6,后面感觉这个版本有点儿高了,在看到一些实例之后,决定退回到CUDA 11.3版本。首先第一步是要卸载掉CUDA 11.6,在搜索了之后,发现并没有找到能用的方法,于是决定直接rm -rf cuda-11.6
CUDA Toolkit 11.3 Update 1 Downloads | NVIDIA Developer
1 2 wget sudo sh
1 2 3 4 5 6 7 8 9 (base) houjinliang@3080server:~/userdoc/cuda_and_cudnn$ sh ./ = Summary = Driver: Not Selected Toolkit: Installed in /mnt/houjinliang/cuda-11.3/ Samples: Not Selected Please make sure that PATH includes /mnt/houjinliang/cuda-11.3/bin LD_LIBRARY_PATH includes /mnt/houjinliang/cuda-11.3/lib64, or, add /mnt/houjinliang/cuda-11.3/lib64 to /etc/ and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /mnt/houjinliang/cuda-11.3/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 465.00 is required for CUDA 11.3 functionality to work. To install the driver using this installer, run the following command , replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run --silent --driver Logfile is /tmp/cuda-installer.log
1 2 3 4 5 6 7 8 9 vim ~/.bashrc ```export CUDA_HOME=$CUDA_HOME :/mnt/houjinliang/cuda-11.3export PATH=$PATH :/mnt/houjinliang/cuda-11.3/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH :/mnt/houjinliang/cuda-11.3/lib64 ```
1 2 3 4 5 6 7 8 (base) houjinliang@3080server:~$ source ~/.bashrc (base) houjinliang@3080server:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 (base) houjinliang@3080server:~/userdoc/cuda_and_cudnn$ tar xvJf cudnn-linux-x86_64- (py38mmdetection) houjinliang@3080server:~/userdoc/cuda_and_cudnn/cudnn-linux-x86_64-$ ll 总用量 48 drwxr-xr-x 4 houjinliang houjinliang 4096 8月 3 2022 ./ drwxrwxr-x 3 houjinliang houjinliang 4096 1月 5 16:32 ../ drwxr-xr-x 2 houjinliang houjinliang 4096 8月 3 2022 include/ drwxr-xr-x 2 houjinliang houjinliang 4096 8月 3 2022 lib/ -rw-r--r-- 1 houjinliang houjinliang 28994 8月 3 2022 LICENSE (py38mmdetection) houjinliang@3080server:~/userdoc/cuda_and_cudnn/cudnn-linux-x86_64-$ cp lib/* ~/cuda-11.3/lib64/ (py38mmdetection) houjinliang@3080server:~/userdoc/cuda_and_cudnn/cudnn-linux-x86_64-$ cp include/* ~/cuda-11.3/includechmod +x ~/cuda-11.3/include/cudnn.hchmod +x ~/cuda-11.3/lib64/libcudnn* (base) houjinliang@3080server:~$ cat ~/cuda-11.3/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 -- /* cannot use constexpr here since this is a C-only file */
PyTorch 1.11 1 2 3 4 5 6 7 8 9 10 11 (base) houjinliang@3080server:~$ conda create -n py38mmdetection python=3.8 -y (base) houjinliang@3080server:~$ conda activate py38mmdetection (py38mmdetection) houjinliang@3080server:~$ conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch (py38mmdetection) houjinliang@3080server:~$ python Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help" , "copyright" , "credits" or "license" for more information. >>> import torch >>> print (torch.cuda.is_available()) True
阿里云源 1 pip config set global.index-url
1 2 3 4 5 6 7 8 9 10 11 (base) houjinliang@3080server:~$ conda create -n py38mmyolo python=3.8 (base) houjinliang@3080server:~$ conda activate py38mmyolo (py38mmyolo) houjinliang@3080server:~$ pip config list global.index-url='' (py38mmyolo) houjinliang@3080server:~$ conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch (py38mmyolo) houjinliang@3080server:~$ python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())" 1.11.0 True
1 2 3 4 pip install -U openmim mim install "mmengine>=0.6.0" mim install "mmcv>=2.0.0rc4,<2.1.0" mim install "mmdet>=3.0.0,<4.0.0"
1 2 3 4 5 6 7 8 9 git clone mmyolo pip install -r requirements/albu.txt mim install -v -e .
1 2 3 4 5 6 7 8 9 10 (base) houjinliang@3080server:~/userdoc/offlinefile$ wget --2024-01-10 16:17:46-- 正在解析主机 (,,, ... 正在连接 (||:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度: 815585330 (778M) [application/zip] 正在保存至: “” 100%[===================================================================================================================>] 777.80M 3.89MB/s 用时 2m 22ss 2024-01-10 16:20:08 (5.48 MB/s) - 已保存 “” [815585330/815585330])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 (py38mmyolo) houjinliang@3080server:~/userdoc/offlinefile$ ll 总用量 26251480 drwxrwxr-x 6 houjinliang houjinliang 4096 1月 10 21:36 ./ drwxrwxr-x 9 houjinliang houjinliang 4096 1月 10 15:59 ../ -rw-rw-r-- 1 houjinliang houjinliang 3996930 1月 10 14:43 clash-linux-amd64-v1.18.0.gz drwxr-xr-x 5 houjinliang houjinliang 4096 8月 26 2022 coco/ -rw-rw-r-- 1 houjinliang houjinliang 6983030 1月 10 17:00 -rw-rw-r-- 1 houjinliang houjinliang 48639045 1月 10 16:21 -rw-rw-r-- 1 houjinliang houjinliang 4372979 1月 10 14:48 curl-8.5.0.tar.gz -rw-rw-r-- 1 houjinliang houjinliang 12353723 1月 5 16:32 pandas-2.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl drwxrwxr-x 2 houjinliang houjinliang 1429504 8月 31 2017 test2017/ -rw-rw-r-- 1 houjinliang houjinliang 6646970404 1月 10 17:47 drwxrwxr-x 2 houjinliang houjinliang 4112384 8月 31 2017 train2017/ -rw-rw-r-- 1 houjinliang houjinliang 19336861798 1月 10 21:35 drwxrwxr-x 2 houjinliang houjinliang 167936 8月 31 2017 val2017/ -rw-rw-r-- 1 houjinliang houjinliang 815585330 7月 11 2018 (py38mmyolo) houjinliang@3080server:~/userdoc/offlinefile$ ll -hl 总用量 26G drwxrwxr-x 6 houjinliang houjinliang 4.0K 1月 10 21:36 ./ drwxrwxr-x 9 houjinliang houjinliang 4.0K 1月 10 15:59 ../ -rw-rw-r-- 1 houjinliang houjinliang 3.9M 1月 10 14:43 clash-linux-amd64-v1.18.0.gz drwxr-xr-x 5 houjinliang houjinliang 4.0K 8月 26 2022 coco/ -rw-rw-r-- 1 houjinliang houjinliang 6.7M 1月 10 17:00 -rw-rw-r-- 1 houjinliang houjinliang 47M 1月 10 16:21 -rw-rw-r-- 1 houjinliang houjinliang 4.2M 1月 10 14:48 curl-8.5.0.tar.gz -rw-rw-r-- 1 houjinliang houjinliang 12M 1月 5 16:32 pandas-2.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl drwxrwxr-x 2 houjinliang houjinliang 1.4M 8月 31 2017 test2017/ -rw-rw-r-- 1 houjinliang houjinliang 6.2G 1月 10 17:47 drwxrwxr-x 2 houjinliang houjinliang 4.0M 8月 31 2017 train2017/ -rw-rw-r-- 1 houjinliang houjinliang 19G 1月 10 21:35 drwxrwxr-x 2 houjinliang houjinliang 164K 8月 31 2017 val2017/ -rw-rw-r-- 1 houjinliang houjinliang 778M 7月 11 2018
1 2 3 4 5 6 7 8 9 10 11 12 (py38mmyolo) houjinliang@3080server:~$ du -h --max-depth=1 6.5M ./.config 8.0K ./.conda 1.1G ./.vscode-server 12G ./cuda-11.3 86G ./userdoc 8.0K ./.gnupg 16K ./.ssh 8.0K ./.nv 2.7G ./.cache 24G ./miniconda3 125G .
系统详细 1 2 3 Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-150-generic x86_64) Model name: Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
NV Driver 1 2 3 (base) houjinliang@3090server:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 515.65.01 Wed Jul 20 14:00:58 UTC 2022 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
个人目录 1 2 3 4 5 6 7 8 9 10 houjinliang@3090server:~$ ll total 40 drwxr-xr-x 4 houjinliang houjinliang 4096 6月 26 10:18 ./ drwxrwxrwx 21 super super 4096 6月 26 10:17 ../ -rw-r--r-- 1 houjinliang houjinliang 220 4月 5 2018 .bash_logout -rw-r--r-- 1 houjinliang houjinliang 3771 4月 5 2018 .bashrc drwx------ 2 houjinliang houjinliang 4096 6月 26 10:18 .cache/ -rw-r--r-- 1 houjinliang houjinliang 8980 4月 16 2018 examples.desktop drwx------ 3 houjinliang houjinliang 4096 6月 26 10:18 .gnupg/ -rw-r--r-- 1 houjinliang houjinliang 807 4月 5 2018 .profile
Miniconda 下载Miniconda的sh脚本文件,增加文件可执行的权限,然后执行下载脚本.
1 2 3 houjinliang@3090server:~/MyDownloadFiles$ wget -c houjinliang@3090server:~/MyDownloadFiles$ chmod +x houjinliang@3090server:~/MyDownloadFiles$ ./
1 2 3 4 5 6 7 8 9 Miniconda3 will now be installed into this location: /mnt/houjinliang/miniconda3 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below [/mnt/houjinliang/miniconda3] >>>
,然后会自动配置 ~/.bashrc
的话,手动输入下面的内容到 ~/.bashrc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 (base) houjinliang@3090server:~$ vim ~/.bashrc __conda_setup="$('/mnt/houjinliang/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null) " if [ $? -eq 0 ]; then eval "$__conda_setup " else if [ -f "/mnt/houjinliang/miniconda3/etc/profile.d/" ]; then . "/mnt/houjinliang/miniconda3/etc/profile.d/" else export PATH="/mnt/houjinliang/miniconda3/bin:$PATH " fi fi unset __conda_setup houjinliang@3090server:~$ source ~/.bashrc (base) houjinliang@3090server:~$
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 (base) houjinliang@3090server:~$ conda info active environment : base active env location : /mnt/houjinliang/miniconda3 shell level : 1 user config file : /mnt/houjinliang/.condarc populated config files : conda version : 24.4.0 conda-build version : not installed python version : solver : libmamba (default) virtual packages : __archspec=1=broadwell __conda=24.4.0=0 __cuda=11.7=0 __glibc=2.27=0 __linux=5.4.0=0 __unix=0=0 base environment : /mnt/houjinliang/miniconda3 (writable) conda av data dir : /mnt/houjinliang/miniconda3/etc/conda conda av metadata url : None channel URLs : package cache : /mnt/houjinliang/miniconda3/pkgs /mnt/houjinliang/.conda/pkgs envs directories : /mnt/houjinliang/miniconda3/envs /mnt/houjinliang/.conda/envs platform : linux-64 user-agent : conda/24.4.0 requests/2.31.0 CPython/3.12.3 Linux/5.4.0-150-generic ubuntu/18.04.6 glibc/2.27 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8 aau/0.4.4 c/. s/. e/. UID:GID : 1035:1035 netrc file : None offline mode : False
1 2 (base) houjinliang@3090server:~$ pip config set global.index-url Writing to /mnt/houjinliang/.config/pip/pip.conf
或者是修改 ~/.config/pip/pip.conf (没有就创建一个), 内容如下:
1 2 3 (base) houjinliang@3090server:~$ cat ~/.config/pip/pip.conf [global] index-url =
NV Driver 1 2 3 (base) houjinliang@3080server:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.60.11 Wed Nov 23 23:04:03 UTC 2022 GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
CUDA 11.3.1 & CUDNN 8.9.5 跟之前的服务器CUDA版本一样,这里还是参照上面的进行安装.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 (base) houjinliang@3090server:~/MyDownloadFiles$ wget (base) houjinliang@3090server:~/MyDownloadFiles$ ll total 3224920 drwxrwxr-x 2 houjinliang houjinliang 4096 6月 26 11:10 ./ drwxr-xr-x 10 houjinliang houjinliang 4096 6月 26 11:04 ../ -rw-rw-r-- 1 houjinliang houjinliang 3158494112 5月 14 2021 -rwxrwxr-x 1 houjinliang houjinliang 143808873 5月 21 02:15* (base) houjinliang@3090server:~/MyDownloadFiles$ chmod +x (base) houjinliang@3090server:~/MyDownloadFiles$ ll total 3224920 drwxrwxr-x 2 houjinliang houjinliang 4096 6月 26 11:10 ./ drwxr-xr-x 10 houjinliang houjinliang 4096 6月 26 11:04 ../ -rwxrwxr-x 1 houjinliang houjinliang 3158494112 5月 14 2021* -rwxrwxr-x 1 houjinliang houjinliang 143808873 5月 21 02:15* (base) houjinliang@3090server:~/MyDownloadFiles$ ./
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (base) houjinliang@3090server:~/MyDownloadFiles$ ./ =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /mnt/houjinliang/cuda-11.3/ Samples: Not Selected Please make sure that - PATH includes /mnt/houjinliang/cuda-11.3/bin - LD_LIBRARY_PATH includes /mnt/houjinliang/cuda-11.3/lib64, or, add /mnt/houjinliang/cuda-11.3/lib64 to /etc/ and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /mnt/houjinliang/cuda-11.3/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 465.00 is required for CUDA 11.3 functionality to work. To install the driver using this installer, run the following command , replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run --silent --driver Logfile is /tmp/cuda-installer.log
配置CUDA Toolkit 的环境变量,使用vim或vscode
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (base) houjinliang@3090server:~$ vim ~/.bashrcexport CUDA_HOME=$CUDA_HOME :/mnt/houjinliang/cuda-11.3export PATH=$PATH :/mnt/houjinliang/cuda-11.3/binexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH :/mnt/houjinliang/cuda-11.3/lib64 (base) houjinliang@3090server:~$ source ~/.bashrc (base) houjinliang@3090server:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 (base) houjinliang@3090server:~/MyDownloadFiles$ ll total 4062292 drwxrwxr-x 2 houjinliang houjinliang 4096 6月 26 11:56 ./ drwxr-xr-x 11 houjinliang houjinliang 4096 6月 26 11:48 ../ -rwxrwxr-x 1 houjinliang houjinliang 3158494112 5月 14 2021* -rw-rw-r-- 1 houjinliang houjinliang 857460936 6月 26 11:57 cudnn-linux-x86_64- -rwxrwxr-x 1 houjinliang houjinliang 143808873 5月 21 02:15* (base) houjinliang@3090server:~/MyDownloadFiles$ tar xvJf cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- cudnn-linux-x86_64- (base) houjinliang@3090server:~/MyDownloadFiles$ cd cudnn-linux-x86_64- (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-$ ll total 48 drwxr-xr-x 4 houjinliang houjinliang 4096 9月 7 2023 ./ drwxrwxr-x 3 houjinliang houjinliang 4096 6月 26 11:58 ../ drwxr-xr-x 2 houjinliang houjinliang 4096 9月 7 2023 include/ drwxr-xr-x 2 houjinliang houjinliang 4096 9月 7 2023 lib/ -rw-r--r-- 1 houjinliang houjinliang 29662 9月 7 2023 LICENSE (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-$ cp lib/* ~/cuda-11.3/lib64/ (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-$ cp include/* ~/cuda-11.3/include (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-$ chmod +x ~/cuda-11.3/include/cudnn.h (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-$ chmod +x ~/cuda-11.3/lib64/libcudnn* (base) houjinliang@3090server:~/MyDownloadFiles/cudnn-linux-x86_64-$ cat ~/cuda-11.3/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 -- /* cannot use constexpr here since this is a C-only file */
Git & Github 1 2 3 4 5 6 7 8 9 10 (base) houjinliang@3090server:~$ git config --global 'hjl_3090server' (base) houjinliang@3090server:~$ git config --global '' (base) houjinliang@3090server:~$ ssh-keygen -t rsa -C "" Generating public/private rsa key pair. Enter file in which to save the key (/mnt/houjinliang/.ssh/id_rsa): Created directory '/mnt/houjinliang/.ssh' . Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /mnt/houjinliang/.ssh/id_rsa. Your public key has been saved in /mnt/houjinliang/.ssh/
1 2 3 4 5 6 7 8 (base) houjinliang@3090server:~/.ssh$ pwd /mnt/houjinliang/.ssh (base) houjinliang@3090server:~/.ssh$ ll total 16 drwx------ 2 houjinliang houjinliang 4096 6月 26 12:11 ./ drwxr-xr-x 12 houjinliang houjinliang 4096 6月 26 12:11 ../ -rw------- 1 houjinliang houjinliang 1679 6月 26 12:11 id_rsa -rw-r--r-- 1 houjinliang houjinliang 407 6月 26 12:11
1 2 3 4 5 6 7 8 9 10 (base) houjinliang@3090server:~$ git config hjl_3090server (base) houjinliang@3090server:~$ git config (base) houjinliang@3090server:~$ ssh -T The authenticity of host ' (' can't be established. ECDSA key fingerprint is xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added ',' (ECDSA) to the list of known hosts. Hi murphyhoucn! You' ve successfully authenticated, but GitHub does not provide shell access.
NV Driver 1 2 3 (base) houjinliang@3090server2:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.183.01 Sun May 12 19:39:15 UTC 2024 GCC version: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)
Ubuntu 22.04.2 LTS
gcc 11.4.0
CUDA11.6 :
cuDNN 8.9.5: cudnn-linux-x86_64-
NV Driver 1 2 3 (sr_benchmark) houjinliang@4090server:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.183.06 Wed Jun 26 06:46:07 UTC 2024 GCC version: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
CUDA 11.6 & cuDNN 8.9.5 1 2 3 4 5 (base) houjinliang@4090server:~/MyDownloadFiles$ ./ (base) houjinliang@3090server:~/MyDownloadFiles$ cd cudnn-linux-x86_64-
至于conda env,我把之前服务器上的环境使用conda-pack打包,然后使用scp传过来,然后解压到对应文件夹下。虽然之前cuda113,torch也是113版本的,但是在cuda116的服务器上也能用(那就先用着?!
问题:Failed to initialize NVML: Driver/library version mismatch 环境正常运行了很长一段时间,但是突然有一天,在运行程序的时候出现了这样一个报错!
1 ERROR: cuda is not available, try running on CPU
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 (base) houjinliang@4090server:~$ nvidia-smi Failed to initialize NVML: Driver/library version mismatch NVML library version: 535.216 (base) houjinliang@4090server:~$ nvitop NVML ERROR: RM has detected an NVML/RM version mismatch. (base) houjinliang@4090server:~$ gpustat Error on querying NVIDIA devices. Use --debug flag to see more details. RM has detected an NVML/RM version mismatch. (base) houjinliang@4090server:~$ gpustat --debug Error on querying NVIDIA devices. Use --debug flag to see more details. RM has detected an NVML/RM version mismatch. Traceback (most recent call last): File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/gpustat/" , line 58, in print_gpustat gpu_stats = GPUStatCollection.new_query(debug=debug, id =id ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/gpustat/" , line 402, in new_query N.nvmlInit() File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/" , line 1947, in nvmlInit nvmlInitWithFlags(0) File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/" , line 1937, in nvmlInitWithFlags _nvmlCheckReturn(ret) File "/mnt/houjinliang/miniconda3/lib/python3.12/site-packages/" , line 899, in _nvmlCheckReturn raise NVMLError(ret) pynvml.NVMLError_LibRmVersionMismatch: RM has detected an NVML/RM version mismatch. (sr_benchmark) houjinliang@4090server:~$ python Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help" , "copyright" , "credits" or "license" for more information. >>> import torch >>> print (torch.cuda.is_available()) /mnt/houjinliang/miniconda3/envs/sr_benchmark/lib/python3.8/site-packages/torch/cuda/ UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.) return torch._C._cuda_getDeviceCount() > 0 False
Ubuntu 22.04.3 LTS
gcc version 12.3.0
NV Driver 1 2 3 (base) houjinliang@4090server2:~$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 550.107.02 Wed Jul 24 23:53:00 UTC 2024 GCC version: gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
CUDA 12.4.1 & cuDNN 8.9.7 CUDA 12.4.1 : CUDA Toolkit 12.4 Update 1 Downloads | NVIDIA Developer
1 (base) houjinliang@4090server2:~/MyDownloadFiles$ wget
1 (base) houjinliang@4090server2:~/MyDownloadFiles$ vim ~/.bashrc
1 2 3 4 5 6 # >>> cuda environment variables >>> # murpy insert export CUDA_HOME=$CUDA_HOME:/data/houjinliang/cuda-12.4 export PATH=$PATH:/data/houjinliang/cuda-12.4/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/houjinliang/cuda-12.4/lib64 # <<< cuda environment variables <<<
1 2 3 4 5 6 7 (base) houjinliang@4090server2:~/MyDownloadFiles$ source ~/.bashrc (base) houjinliang@4090server2:~/MyDownloadFiles$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0
CUDNN : cudnn-linux-x86_64-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 (base) houjinliang@4090server2:~/MyDownloadFiles$ tar xvJf cudnn-linux-x86_64- (base) houjinliang@4090server2:~/MyDownloadFiles$ cd cudnn-linux-x86_64- (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-$ ll total 48 drwxr-xr-x 4 houjinliang houjinliang 4096 11月 30 2023 ./ drwxrwxr-x 3 houjinliang houjinliang 4096 10月 24 22:53 ../ drwxr-xr-x 2 houjinliang houjinliang 4096 11月 30 2023 include/ drwxr-xr-x 2 houjinliang houjinliang 4096 11月 30 2023 lib/ -rw-r--r-- 1 houjinliang houjinliang 29662 11月 30 2023 LICENSE (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-$ cp lib/* ~/cuda-12.4/lib64/ (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-$ cp include/* ~/cuda-12.4/include (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-$ chmod +x ~/cuda-12.4/include/cudnn.h (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-$ chmod +x ~/cuda-12.4/lib64/libcudnn* (base) houjinliang@4090server2:~/MyDownloadFiles/cudnn-linux-x86_64-$ cat ~/cuda-12.4/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 -- /* cannot use constexpr here since this is a C-only file */
git install 这台服务器上没有git,使用deb包安装一个
1 (base) houjinliang@4090server2:~/MyDownloadFiles$ wget
1 2 3 4 5 6 7 8 9 10 11 12 13 14 (base) houjinliang@4090server2:~/MyDownloadFiles$ cd ~ (base) houjinliang@4090server2:~$ mkdir git (base) houjinliang@4090server2:~$ dpkg -x ./MyDownloadFiles/git_2.34.1-1ubuntu1.11_amd64.deb ./git (base) houjinliang@4090server2:~$ cd git/ (base) houjinliang@4090server2:~/git$ ll total 20 drwxr-xr-x 5 houjinliang houjinliang 4096 5月 20 20:14 ./ drwxr-x--- 14 houjinliang houjinliang 4096 10月 24 23:22 ../ drwxr-xr-x 3 houjinliang houjinliang 4096 5月 20 20:14 etc/ drwxr-xr-x 5 houjinliang houjinliang 4096 5月 20 20:14 usr/ drwxr-xr-x 3 houjinliang houjinliang 4096 5月 20 20:14 var/
1 (base) houjinliang@4090server2:~$ vim ~/.bashrc
1 2 3 4 5 export PATH=$PATH :~/git/usr/binexport GIT_EXEC_PATH=~/git/usr/lib/git-core
1 (base) houjinliang@4090server2:~$ source ~/.bashrc
1 2 (base) houjinliang@4090server2:~$ git --version git version 2.34.1
git 配置 1 2 3 4 5 6 7 8 9 10 11 (base) houjinliang@4090server2:~$ git config --global 'hjl_4090server2' (base) houjinliang@4090server2:~$ git config --global '' (base) houjinliang@4090server2:~$ ssh-keygen -t rsa -C "" Generating public/private rsa key pair. Enter file in which to save the key (/data/houjinliang/.ssh/id_rsa): Created directory '/data/houjinliang/.ssh' . Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /data/houjinliang/.ssh/id_rsa Your public key has been saved in /data/houjinliang/.ssh/ (base) houjinliang@4090server2:~$ cat ~/.ssh/
1 2 3 4 5 6 7 8 9 10 11 (base) houjinliang@4090server2:~$ git config hjl_4090server2 (base) houjinliang@4090server2:~$ git config (base) houjinliang@4090server2:~$ ssh -T The authenticity of host ' (' can't be established. ED25519 key fingerprint is SHA256:+DiY3wvvV6TuJJhbpZisF/zLDA0zPMSvHdkr4UvCOqU. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '' (ED25519) to the list of known hosts. Hi murphyhoucn! You' ve successfully authenticated, but GitHub does not provide shell access.
conda env 虽然4090server2上面的CUDA环境是12.4,但这里还是用了在3080上配置的sr_benchmark的环境。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 (base) houjinliang@4090server2:~$ mkdir ~/miniconda3/envs/sr_benchmark (base) houjinliang@4090server2:~$ tar -xzvf ./MyDownloadFiles/sr_benchmark.tar.gz -C ~/miniconda3/envs/sr_benchmark (base) houjinliang@4090server2:~$ conda env list base * /data/houjinliang/miniconda3 sr_benchmark /data/houjinliang/miniconda3/envs/sr_benchmark (base) houjinliang@4090server2:~$ (base) houjinliang@4090server2:~$ conda activate sr_benchmark (sr_benchmark) houjinliang@4090server2:~$ python Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help" , "copyright" , "credits" or "license" for more information. >>> import torch >>> print (torch.cuda.is_available()) True >>> torch 1.10.1+cu113 torchvision 0.11.2+cu113
Docker Install
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 sudo apt update sudo apt install \ apt-transport-https \ ca-certificates \ curl \ gnupg \ lsb-release curl -fsSL | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpgecho \ "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt update sudo apt install docker-ce docker-ce-cli sudo systemctl enable docker sudo systemctl start docker
1 2 3 4 5 6 7 8 9 10 11 12 sudo groupadd docker sudo usermod -aG docker $USER sudo usermod -aG docker xxxxxxxx getent group docker grep '^docker:' /etc/group
docker 代理配置需要管理员用户 !
上网代理,参考瞧瞧我对服务器干了些什么! - MurphyHou (
1 2 3 4 5 6 7 8 9 10 11 12 13 vim /etc/docker/daemon.json { "registry-mirrors" : [ "" , "" ] } sudo systemctl daemon-reload sudo systemctl restart docker
二、docker pull代理
1 2 3 4 5 6 7 8 9 10 11 12 sudo mkdir -p /etc/systemd/system/docker.service.d sudo touch /etc/systemd/system/docker.service.d/proxy.conf [Service] Environment="HTTP_PROXY=" Environment="HTTPS_PROXY=" Environment="NO_PROXY=localhost,," sudo systemctl daemon-reload sudo systemctl restart docker
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 vim ~/.docker/config.json { "proxies" : { "default" : { "httpProxy" : "" , "httpsProxy" : "" , "noProxy" : "localhost,," } } }
1 docker run --rm hello-world
上述的docker环境配置好之后,可以配置一下overleaf. 特别是得配置好网络环境,要不然Docker Image拉取不下来
配置 1 2 3 4 5 6 7 8 git clone ./overleaf-toolkit && cd overleaf-toolkit bin/init bin/up
远程访问 因为服务是在远程服务器上,为了在本地能直接方法,需要修改端口和外网访问
1 2 OVERLEAF_LISTEN_IP=xx.xx.xx.xx # 远程服务器IP OVERLEAF_PORT=80 # 默认是80
Overleaf 容器启动之后,可以打开 http://xx.xx.xx.xx:xx/launchpad 注册管理员帐户。之后我们就可以用这个帐户登录 Overleaf 平台。
后记 因为Overleaf官网对于免费用户,只有20s的编译时间,超过时间限制则无法编译。对于这种情况,只能付费解决。如果面对我遇到这样的情况的话,我可能也会选择付费的方式。但在网上看到了可以在服务器上搭建自己的Overleaf,所以想跟着教程自己试一下。按照教程一步步走下来,最后也配置成功了。也许最后并不会使用自己配置的这个,但折腾永不停息,万一用到了呢?!