EDUBERROCAL.NET

Making CUDA Work on That Old GT 710

01-July-2024

I have an old computer lying around that I mainly use as a backup server, but it has, incidentally, a small NVIDIA GPU on it. More specifically, a GeForce GT 710.

$ lspci | grep NVIDIA
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1)

I wanted to dust off some of my CUDA knowledge and thought I could start using this card. I did some digging, and lucky for me, the GT 710 can run CUDA. Following this guide, I installed CUDA and the needed driver for my specific card on my Debian 12:

$ sudo apt-get install linux-headers-`uname -r`

$ sudo add-apt-repository contrib

$ sudo apt-key del 7fa2af80

$ wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb

$ sudo dpkg -i cuda-keyring_1.1-1_all.deb

$ sudo apt-get update

$ sudo apt-get -y install cuda

...reboot...

$ sudo apt-get install nvidia-tesla-470-driver

$ sudo modprobe nvidia-tesla-470

After this, we have to configure our system so it can find all the installed binaries and libraries. The following has to be run on the terminal and also be added to the .bashrc file:

export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

One way to test that everything is OK is to run nvidia-smi:

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.54

As it turns out, I had a driver/library version mismatch. The problem is that the CUDA package that gets installed is version 12.4, which has a library version that requires a newer driver than the one installed for my GT 710. My driver version is 470.223.02. According to this page, the minimum driver version for CUDA 12.x is 525.60.13. The minimum driver version for CUDA 11.x is 450.80.02, so I have to use CUDA 11:

$ sudo apt-get purge nvidia-*

$ sudo apt-get purge cuda*

$ sudo apt-get autoremove

$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run

$ sudo sh cuda_11.4.0_470.42.01_linux.run

$ sudo apt-get install nvidia-tesla-470-driver

$ sudo modprobe nvidia-tesla-470

We must reconfigure our PATH and LD_LIBRARY_PATH libraries, changing 12.4 for 11.4:

export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Finally:

$ nvidia-smi
Thu May 16 04:18:52 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 N/A |                  N/A |
| 40%   35C    P0    N/A /  N/A |      0MiB /  2000MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Everything seems finally in order. However, when I try to compile the core samples, I run into another issue:

$ cd NVIDIA_CUDA-11.4_Samples/
$ make
...
139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
| ^~~~~

In this system, I use the bookworm distribution of Debian, which installs version 12 of the C/C++ compilers and standard libraries. Fortunately, the issue is easy to fix. For version 10 of GCC, we can install the package gcc-10:

$ sudo apt-get install gcc-10

This installs version 10.2.1-6 of GCC and libstdc. For C++, I had to download the packages and install them manually:

$ wget http://ftp.cz.debian.org/debian/pool/main/g/gcc-10/g++-10_10.2.1-6_amd64.deb

$ wget http://ftp.cz.debian.org/debian/pool/main/g/gcc-10/libstdc++-10-dev_10.2.1-6_amd64.deb

$ sudo dpkg -i libstdc++-10-dev_10.2.1-6_amd64.deb

$ sudo dpkg -i g++-10_10.2.1-6_amd64.deb

The only thing left to do is to make sure that the CUDA compiler finds those specific versions of the C/C++ compilers:

$ sudo ln -s /bin/gcc-10 /usr/local/cuda-11.4/bin/gcc
$ sudo ln -s /bin/g++-10 /usr/local/cuda-11.4/bin/g++

Finally, I can compile all the CUDA samples just fine. After compilation, I run the deviceQuery sample to get all the details about my modest GeForce GT 710:

$ cd NVIDIA_CUDA-11.4_Samples/
$ make
...
Finished building CUDA samples
$
$ ./1_Utilities/deviceQuery/deviceQuery
./1_Utilities/deviceQuery/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GT 710"
  CUDA Driver Version / Runtime Version          11.4 / 11.4
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 2000 MBytes (2097283072 bytes)
  (001) Multiprocessors, (192) CUDA Cores/MP:    192 CUDA Cores
  GPU Max Clock rate:                            954 MHz (0.95 GHz)
  Memory Clock rate:                             800 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

I hope this is as useful to you as it was for me :-).