Hello, Are pre-trained recurrent networks re-initialized when used in agents for reinforment learning? If so, how can it be avoided? I am importing a LSTM network trained using supervised training as the actor for a PPO agent. When simulating without training the reward is fine, however If the agent is trained the reward falls as if no pre-trained network was used. I would expect the reward to be similar or higher after training so presumably the network is being re-initialized, is there a way around it? Thanks % Load actor load(netDir); actorNetwork = net.Layers; actorOpts = rlRepresentationOptions('LearnRate',learnRate); actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'input'},actorOpts); % Create critic criticNetwork = [sequenceInputLayer(numObs,"Name","input") lstmLayer(numObs) softplusLayer() fullyConnectedLayer(1)]; criticOpts = rlRepresentationOptions('LearnRate',learnRate); critic = rlValueRepresentation(criticNetwork,obsInfo,'Observation',{'input'},criticOpts); % Create agent agentOpts = rlPPOAgentOptions('ExperienceHorizon',expHorizon, 'MiniBatchSize',miniBatchSz, 'NumEpoch',nEpoch, 'ClipFactor', 0.1); agent = rlPPOAgent(actor,critic,agentOpts); % Train agent trainOpts = rlTrainingOptions('MaxEpisodes',episodes, 'MaxStepsPerEpisode',episodeSteps, ... 'Verbose',false, 'Plots','training-progress', ... 'StopTrainingCriteria', 'AverageReward', ... 'StopTrainingValue',10); % Run training trainingStats = train(agent,env,trainOpts); % Simulate simOptions = rlSimulationOptions('MaxSteps',2000); experience = sim(env,agent,simOptions);
John Williams answered .
2025-11-20