Allow to train even when no GPU available#8
Conversation
Fixes the error when the --accelerator cpu option was given, it's possible to train with only CPU too. with small batches and considerably amount of memory, and days allowing the train process to run. RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
|
Isn't using CPU very slow? Are you using small datasets when it is CPU training? |
yes, it is pretty slow. I see your point. I guess the expectation of the available 'cpu' could be handed gracefully, in case the gpu quota has been reached. As a note aside: Right now I'm working adapting to Kaggle, they are offering 30 hours per week in GPU and 20 hours in TPU. I'm still testing the CPU, given that is throwing a similar error. I'm still polishing the notebook. |
|
I'm interested in the Kaggle notebook if you get it working. Kaggle also supports multi GPU, so I think you can use 2x T4, but no idea if it has a noticeable speed gain or something. But for multi-GPU to work you'll have to modify Piper source code directly I think. |
https://www.kaggle.com/code/scratchpad/notebook6c3922c7e3/edit I just opened it to the public, is in alpha state. I haven't been able to use it properly with the two GPUs, nor the Tensor one. It runs with CPU, single 2x T4 and GPU P100, the latter one being like 3X faster than 1T4. I'm new to ML, jupyter notebooks and kaggle. The features that this notebook are:
Right now I'm adding widgets to make it more useable. I would be glad to join conversation in other space if needed. BTW, thanks for all your hard work. Other things to be done:
|
|
I have been testing TPU in Kaggle, and it seems that the direct use of cuda is not recommended. For now, a sample with lightning and TPU. |
This bugfix allows to train when only CPU is available.
Previously when the --accelerator cpu option was given by the user, without a GPU, the train process stopped with error:
RuntimeError: Found no NVIDIA driver on your system. Please check that
you have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
Now, when I pass the 'cpu' option and I do not have a GPU, I'm able to train.