Before you start the installation process make sure you have ssh working so that you can ssh to the machine from another machine if the graphics UI gets stuck in a loop or the black screen appears.
===============================================================
Before reinstalling CUDA remove the old versions
cd ~
rm -fr cuda
If you used apt-get/rpm to install a previous version find installation instructions elsewhere.
Use the following command to uninstall a Toolkit runfile installation:
$ sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl
Use the following command to uninstall a Driver runfile installation:
$ sudo /usr/bin/nvidia-uninstall
sudo apt-get purge nvidia*
# Note this might remove your cuda installation as well
sudo apt-get autoremove
=======================================================================
# under construction
# download cuda and cudaNN
===============================================================
Read more at: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ixzz4ZNTXuOhQ
Install cudaNN
https://developer.nvidia.com/rdp/cudnn-download
Download
cuDNN v5.1 Library for Linux
Follow the installation instructions:
# from https://groups.google.com/forum/#!topic/theano-users/4qKbh5C_9e4
# First download the file. Example version: cudnn-7.5-linux-x64-v5.0-rc.tgz
Extract it to home directory
and set the LD_LIBRARY_PATH to the above extracted directory
and then follow the below steps assuming that /usr/local/cuda is soft linked to the correct cuda version:
sudo cp $HOME/cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp $HOME/cuda/lib64/libcudnn* /usr/local/cuda/lib64/
======================================================================================
sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl
sudo /usr/bin/nvidia-uninstall
cd ~
rm -fr cuda
sudo apt-get --purge remove nvidia-*
sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo service lightdm stop
sudo apt-get install linux-headers-$(uname -r)
cd Downloads
./cuda_8.0.61_375.26_linux.run -extract=~/Downloads/nvidia_installers;
cd nvidia_installers
sudo ./NVIDIA-Linux-x86_64-367.48.run --no-opengl-files
sudo ./cuda-linux64-rel-8.0.44-21122537.run
sudo ./cuda-samples-linux-8.0.44-21122537.run
sudo update-initramfs -u
sudo reboot
sudo service lightdm start
========================================================================
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
# you should see something like this
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6072 MBytes (6367150080 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1785 MHz (1.78 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 6GB
Result = PASS
=======================================================================
# Using Python virtual environement setup for deep learning
workon deep_learning
cd ~/Dropbox/cuda
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,optimizer_including=cudnn' python gpu_test.py
=======================================================================
http://kislayabhi.github.io/Installing_CUDA_with_Ubuntu/
alternatively:
https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07#check-the-installation
REMEMBER: DO NOT FOLLOW THE NVIDIA INSTALLATION INSTRUCTIONS
Note that as of Nov. 2016 that the latest version of CUDA is 8.0...
Ignore the error when the Nvidia installing complains and shows the abort message.
Accept the sym link creation option
Set the LD_LIBRARY_LIBS in .bashrc to use the sym link to /usr/local/cuda and not the version
There may be some missing lib errors during the compilation of the /usr/local/cuda/samples directory
Add the missing libs.
sudo apt-get install freeglut3-dev
/usr/local/cuda/samples$ sudo make
[...]
Finished building CUDA samples
/usr/local/cuda/samples$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.48 Sat Sep 3 18:21:08 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
/usr/local/cuda/samples$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
/usr/local/cuda/samples$ cd 1_Utilities/deviceQuery
usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6069 MBytes (6363873280 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1785 MHz (1.78 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 6GB