Installing CUDA On My Laptop

Recently I got interested in machine learning frameworks again, partly due to various projects at work and partly due to an old image classification challenge (based on determining if a caprine image is upside-down or not) I tried various CV methods on a few years ago. Most models I frequently use for work, like logistic regression and random forests, train quickly with sklearn on CPU. Deep neural networks are the only models that require GPU to train efficiently.

Trying out Keras Image classification from scratch, it quickly became apparent that the tiny 1-core 3.5 GB RAM compute VM we have at work for each data scientist to do non-intensive tasks wasn’t going to cut it. It struggled with even just reading and filtering the image dataset. My personal Thinkpad (the P Series advertised as a “mobile workstation”) with 4 cores and 24 GB RAM was significantly faster, operating at about 25 sec per batch of 128 training images. This would still take an hour or two to run, which is fine, but I remembered picking my laptop specifically because its Nvidia Quadro M1200 (Mobile) could run CUDA.

Every single time I’ve tried to install CUDA or anything Nvidia driver related on linux has been a pain, but the allure of fast GPU computation won out and I decided to bite the bullet to try once more. On Friday, I tried following a tutorial for Ubuntu 22.04, which recommended installing CUDA the usual way from the Nvidia keyring deb. Unsurprisingly, the graphics drivers broke and systemd complained about my GPU not being compatible with open Nvidia drivers. The secondary monitor stopped working, so I decided to do the ritual complete purge of nvidia-* to see if I could start over. The second monitor still wasn’t working, so I had to reinstall the recommended closed source nvidia drivers to restore normality.

The next day, on Saturday afternoon, I steeled myself to try again. Thinking about the kernel error message, I remember the “big news” when Nvidia was open-sourcing its drivers, but only for newer architectures. The Quadro M1200 is from 2017, and its Maxwell architecture only supports up to CUDA Compute Capability 5.0. I actually listened to the error this time and through online searching found out I should install the proprietary drivers, then the cuda-toolkit separately. By default, the installer installs the open-source drivers together with the cuda-toolkit. Somehow I managed to install the toolkit separately and I could compile deviceQuery from the CUDA Samples.

Now I returned to my python notebook, ready to have Keras load the CUDA compute. But obviously it wasn’t that simple. Tensorflow gave a strange error about no matching CUDA binary, which was surprising as I had just successfully compiled and ran an example CUDA sample. Funnily enough, I found out from more searching that TensorFlow 2.17 (precompiled) from pip had just dropped compute capability 5.0 kernels, so I had to install the previous version 2.16.1.

Finally I could run GPU training, but the model training ran for about a minute before it shat itself saying it ran out of memory trying to allocate 10 GB. I was afraid the provided small version of the Xception network was still too deep, and I wouldn’t be able to run the network at all without drastically cutting down the layers. However my GPU only has 4 GB of memory, so I scaled down the batch size from 128 to 32 and thankfully this worked. The cats and dogs were classified with high accuracy, and I finally had a working CUDA installation (for the time being).