安装caffe的过程中出现了一些bug,导致得重新安装cuda和cudnn,nvidia驱动也已经安装

1. 安装cuda,cudnn,nvidia driver

1. 1 到腾的开始

因为caffe的问题导致cuda崩了,此时发现cuda也装不上去了,只能重新尝试11.0版本

1
2
3
官网下载cuda11.0 toolkit,适配ubuntu20.04
wget https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda_11.0.3_450.51.
sudo sh cuda_11.0.3_450.51.06_linux.run .run

始终说出现问题,驱动删除后仍然报错,我决定不选择11.0,driver报错问题,考虑先不安装driver,安装成功

1.2. nvidia驱动推荐安装460

1.3. 安装驱动成功后:
1
2
nvidia-smi
cuda 11.2, nvidia-driver 460

1.4 安装后插入环境变量
1
2
3
4
5
gedit ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
source ~/.bashrc
1.5 测试cuda

果然又出现了bug

1
2
3
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
1
2
3
4
5
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -o deviceQuery.o -c deviceQuery.cpp
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
g++: No such file or directory
nvcc fatal : Failed to preprocess host compiler properties.
make: *** [Makefile:309:deviceQuery.o] 错误 1
1.5 g++没安装成功,通过build-essential安装,依赖出现问题
1
2
3
下列软件包有未满足的依赖关系:
libc6-dev : 依赖: libc6 (= 2.31-0ubuntu9) 但是 2.31-0ubuntu9.2 正要被安装
E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。
1.6 通过智能安装包aptitude解决依赖问题

找到可解决方案后成功安装g++,此时运行sudo make没有报错,只有warning,但是看信息结果又出错了

1
2
3
4
5
6
7
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 999
-> unknown error
Result = FAIL
1.7 再次尝试

无奈卸载使用11.02版本尝试,又报错了

1
2
3
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common  -m64    --threads 0 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o deviceQuery.o -c deviceQuery.cpp
nvcc fatal : Unknown option '--threads'
make: *** [Makefile:323:deviceQuery.o] 错误 1

但是nvcc -V显示正确,无法解决,只能进行下一步

1
2
3
4
5
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0
1.8 先不管报错,cudnn没问题

cudnn,cuda路径没问题

1
2
3
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
cat /usr/local/cuda/version.txt
CUDA Version 11.0.207

2. 安装caffee

1
2
3
4
sudo apt install --no-install-recommends libboost-all-dev
sudo apt install cmake git unzip libgflags-dev libgoogle-glog-dev libprotobuf-dev libleveldb-dev liblmdb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler libatlas-base-dev libopenblas-dev liblapack-dev the python3-dev python3-skimage graphviz python-protobuf
pip install --upgrade pip
pip install numpy pydot protobuf scikit-image

这中间出现了依赖包关系降级问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
保持 下列软件包于其当前版本:           
1) libhdf5-dev [未安装的]
2) libprotobuf-dev [未安装的]
3) python3-dev [未安装的]
4) python3.8-dev [未安装的]
5) zlib1g-dev [未安装的]

保留下列未解决的依赖关系:
6) protobuf-compiler 推荐 libprotobuf-dev



是否接受该解决方案?[Y/n/q/?] n
下列动作将解决这些依赖关系:

降级 下列软件包:
1) zlib1g [1:1.2.11.dfsg-2ubuntu1.2 (now) -> 1:1.2.11.dfsg-2ubuntu1 (focal)]

解决完之后安装git clone caffee

1
2
3
git clone https://github.com/BVLC/caffe.git
cp Makefile.config.example Makefile.config
gedit Makefile.config
1
2
3
4
5
6
7
8
9
conda create -n caffe python=3.x  #python和ubuntu中的python版本相同
conda activate caffe
conda install -y numpy
conda install -y scikit-image

sudo cp -r /usr/lib/python3/dist-packages/caffe /home/guoba/anaconda3/envs/caffe/lib/python3.8/site-packages/caffe
#caffe_scr_dir 为caffe的安装路径,默认为/usr/lib/python3/dist-packages/
#anaconda_dir为anaconda安装路径,默认为~/anaconda
sudo cp -r caffe_scr_dir/google anaconda_dir/envs/caffe/lib/python3.x/site-packages/google
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
-D BUILD_TIFF=ON \
-D WITH_FFMPEG=ON \
-D WITH_GSTREAMER=ON \
-D WITH_TBB=ON \
-D BUILD_TBB=ON \
-D WITH_EIGEN=ON \
-D WITH_V4L=ON \
-D WITH_LIBV4L=ON \
-D WITH_VTK=OFF \
-D WITH_QT=OFF \
-D WITH_OPENGL=ON \
-D OPENCV_ENABLE_NONFREE=ON \
-D INSTALL_C_EXAMPLES=OFF \
-D INSTALL_PYTHON_EXAMPLES=OFF \
-D BUILD_NEW_PYTHON_SUPPORT=ON \
-D OPENCV_GENERATE_PKGCONFIG=ON \
-D BUILD_TESTS=OFF \
-D OPENCV_DNN_CUDA=ON \
-D ENABLE_FAST_MATH=ON \
-D CUDA_FAST_MATH=ON \
-D CUDA_ARCH_BIN=7.0 \
-D WITH_CUBLAS=ON \
-D WITH_CUDNN=ON \
-D CUDNN_LIBRARY=/usr/local/cuda/lib64/libcudnn.so.8.0.5 \
-D CUDNN_INCLUDE_DIR=/usr/local/cuda/include \
-D BUILD_EXAMPLES=OFF ..

详细链接整合:

https://cyfeng.science/2020/05/02/ubuntu-install-nvidia-driver-cuda-cudnn-suits/
https://www.lijingle.com/thread-36-1-1.html
https://zhuanlan.zhihu.com/p/339835760
https://www.dazhuanlan.com/2019/12/05/5de8098e42817
https://www.cnblogs.com/klchang/p/14353384.html