This error stops training when the 'ExecutionEnvironment' is 'parallel', 'multi-gpu', or 'gpu'. Training is running uninterrupted when set to 'cpu'. I'm running code for the first time on Ubuntu 20.04.2 LTS system with Intel i9 12 core cpu and 2x 3070 gpu's. It indicates only 12 workers and seems to not recognize the gpus.
Kshitij Singh answered .
2025-11-20
setenv NVIDIA_TF32_OVERRIDE 0