Saved agent always gives constant output no matter how or how much I train it

Illustration
rihana_01 - 2021-04-12T11:25:39+00:00
Question: Saved agent always gives constant output no matter how or how much I train it

I trained a DDPG RL Agent in Simulink environment. The training looked fine to me and I saved agents in the process. I trained the RL agent using different networks and the saved agents always gives a const output (namely, the LowerLimit of action) Please help me. I have been looking for help from the past week. INPUTMAX = 1E-4; actionInfo = rlNumericSpec([2 1],'LowerLimit',-INPUTMAX,'UpperLimit', INPUTMAX); actionInfo.Name = 'Inlet flow rate change'; observationInfo = rlNumericSpec([5 1],'LowerLimit',[300;300;1.64e5;0;0],'UpperLimit',[393;373;6e5;0.01;0.01]); observationInfo.Name = 'Temperatures, Pressure and flow rates'; env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],observationInfo,actionInfo); L = 25; % number of neurons %% CRITIC NETWORK statePath = [ featureInputLayer(5,'Normalization','none','Name','observation') fullyConnectedLayer(L,'Name','fc1') reluLayer('Name','relu1') concatenationLayer(1,2,"Name",'concat') fullyConnectedLayer(29,'Name', 'fc2') reluLayer("Name",'relu3') fullyConnectedLayer(29,'Name', 'fc3') reluLayer('Name','relu2') fullyConnectedLayer(1,'Name','fc4') ]; actionPath = [ featureInputLayer(2,'Normalization','none','Name','action') fullyConnectedLayer(4,'Name','fcaction') reluLayer("Name",'actionrelu') ]; criticNetwork = layerGraph(statePath); criticNetwork = addLayers(criticNetwork, actionPath); criticNetwork = connectLayers(criticNetwork,'actionrelu','concat/in2'); criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',1e-4,"UseDevice","gpu"); critic = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,... 'Observation',{'observation'},'Action',{'action'},criticOptions); % plot(criticNetwork) %% ACTOR NETWORK actorNetwork = [ featureInputLayer(5,'Normalization','none','Name','observation') fullyConnectedLayer(L,'Name','fc1') sigmoidLayer('Name','sig1') fullyConnectedLayer(L,'Name','fc4') reluLayer('Name','relu4') fullyConnectedLayer(2,'Name','fc5') tanhLayer('Name','tanh1') scalingLayer("Name","scale","Scale",INPUTMAX*ones(2,1)) ]; actorNetwork = layerGraph(actorNetwork); % plot(actorNetwork) actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-5,"UseDevice","gpu"); actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,... 'Observation',{'observation'},'Action',{'scale'},actorOptions); agentOptions = rlDDPGAgentOptions(... 'TargetSmoothFactor',1e-3,... 'ExperienceBufferLength',1e4,... 'SampleTime',1,... 'DiscountFactor',0.99,... 'MiniBatchSize',64,... "NumStepsToLookAhead",1,... "SaveExperienceBufferWithAgent",true, ... "ResetExperienceBufferBeforeTraining",false); agentOptions.NoiseOptions.Variance = 0.4; agentOptions.NoiseOptions.VarianceDecayRate = 1e-5; agent = rlDDPGAgent(actor,critic,agentOptions); maxepisodes = 1000; maxsteps = 500; trainingOpts = rlTrainingOptions(... 'MaxEpisodes',maxepisodes,... 'MaxStepsPerEpisode',maxsteps,... 'Verbose',false,... 'Plots','training-progress',... "ScoreAveragingWindowLength",50,... "StopTrainingCriteria","AverageSteps",... 'StopTrainingValue',501,... 'SaveAgentCriteria',"EpisodeReward", ... "SaveAgentValue",0); trainingOpts.UseParallel = true; trainingOpts.ParallelizationOptions.Mode = 'async'; trainingStats = train(agent,env,trainingOpts);  

Expert Answer

Profile picture of Neeta Dsouza Neeta Dsouza answered . 2025-11-20

The problem formulation is not correct. I suspect that even during training, you are seeing a lot of bang bang actions. The biggest issue is that the noise variance is pretty big compared to your action range. This needs to be fixed. Take a look at this note, "It is common to set StandardDeviation*sqrt(Ts) to a value between 1% and 10% of your action range"
 


Not satisfied with the answer ?? ASK NOW

Get a Free Consultation or a Sample Assignment Review!