Fixing Slow Multi-GPU Semantic Segmentation Performance

ritasingh - 2021-04-13T10:14:08+00:00

Question: Multiple GPUs perform slower than single GPU to train a semantic segmentation network

I have to my disposal two NVIDIA Tesla V100-16Gb GPUs to train a deep neural network model for semantic segmentation. I am training the Inception-ResNet-v2 network with the DeepLab v3+ architecture. I am using the randomPatchExtractionDatastore to feed the network with training data. When I set the 'ExecutionEnvironment' option to multi-gpu the processing time for each iteration is higher than using only gpu, that is a single GPU. I am working in Windows 10 with MatLab 2019b. What should I do to use the full potential of both GPUs for training? Bellow an example of my code pathSize = 512; imageSize = [pathSize pathSize 3]; numClasses = 6 lgraph = deeplabv3plusLayers(imageSize, numClasses, 'inceptionresnetv2','DownsamplingFactor',16); MaxEpochs=10; PatchesPerImage=1500; MiniBatchSize=20; options = trainingOptions('sgdm', ... 'ExecutionEnvironment','gpu',... 'LearnRateSchedule','piecewise',... 'LearnRateDropPeriod',3,... 'LearnRateDropFactor',0.2,... 'Momentum',0.9, ... 'InitialLearnRate',0.03, ... 'L2Regularization',0.001, ... 'MaxEpochs',MaxEpochs, ... 'MiniBatchSize',MiniBatchSize, ... 'Shuffle','every-epoch', ... 'CheckpointPath', tempdir, ... 'VerboseFrequency',2,... 'Plots','training-progress',... 'ValidationPatience', 4); imageAugmenter = imageDataAugmenter( ... 'RandRotation',[-20,20], ... 'RandXTranslation',[-10 10], ... 'RandYTranslation',[-10 10]); % Random patch extraction datastore PatchSize=[pathSize pathSize]; dsTrain = randomPatchExtractionDatastore(imds,pxds,PatchSize,'PatchesPerImage',PatchesPerImage,'DataAugmentation',imageAugmenter); [net, ~] = trainNetwork(dsTrain,lgraph,options);

Expert Answer

Prashant Kumar answered . 2025-11-20

On Windows, due to GPU communication issues on that platform, it is difficult to get any benefit from multi-GPU training. This will be improved in a future release. Try the following:

Maximize the patches per image and the MiniBatchSize
Increase the learn rate to match the number of GPUs

If moving to Linux is an option for you that is definitely the way to go.

Multiple GPUs perform slower than single GPU to train a semantic segmentation network

Question: Multiple GPUs perform slower than single GPU to train a semantic segmentation network

Related Questions

Expert Answer

Prashant Kumar answered . 2025-11-20

Not satisfied with the answer ?? ASK NOW

Get a Free Consultation or a Sample Assignment Review!

MATLAB & Simulink Help

Programming & Technical Help

Engineering & Specialized Tools

Writing & Exam Services

Data Analysis Services

Multiple GPUs perform slower than single GPU to train a semantic segmentation network

Question: Multiple GPUs perform slower than single GPU to train a semantic segmentation network

Related Questions

Expert Answer

Prashant Kumar answered . 2025-11-20

Not satisfied with the answer ?? ASK NOW

Get a Free Consultation or a Sample Assignment Review!