I am trying to tune my TD3 agent to solve my custom environment. The environment has two actions in the following range: the first one in [0 10] and the second one in [0 2*PI) (rlNumericSpace). I am following this example architecture--- https://in.mathworks.com/help/reinforcement-learning/ug/train-td3-agent-for-pmsm-control.html Now I have the following questions. Since tanh is [-1 1], should I use the scaling layer at the actor network's end? maybe with the following values scalingLayer('Name','ActorScaling1','Scale',[5;pi],'Bias',[5;pi])]; 2. How to setup Exploration noise and Target policy noise? I mean, what should be their variance values? Well, not precisely tuned, but a competent range given I have more than one action and the provided action range is not in [-1 1] ? 3. How do I clip those values to fit inside the action bound? I dont see any such option in rlTD3AgentOptions I see all the TD3 examples (and most RL examples in general) action's range is b/n [-1 1]. I am confused about modifying the parameters when the action space is not within [-1 1], like in my case.
Neeta Dsouza answered .
2025-11-20
Great questions! Let's tackle each of your queries step-by-step:
Yes, you should use the scaling layer at the end of the actor network to scale the actions to the desired range. The values you provided look correct:
scalingLayer('Name','ActorScaling1','Scale',[5;pi],'Bias',[5;pi])
This will scale the first action to the range [0, 10] and the second action to the range [0, 2π).
For TD3, it's important to add noise to the actions to encourage exploration. Given your action space ranges are not within [-1, 1], you'll need to adjust the noise accordingly:
Exploration Noise: This noise is added to the actions during training to explore the action space. A common approach is to use a Gaussian noise with a small standard deviation relative to the action range. For your case, you might start with something like:explorationNoiseVariance = [1 0.1]; % Variances for the two actions
explorationNoise = sqrt(explorationNoiseVariance) .* randn(size(action));
While rlTD3AgentOptions doesn't have a built-in action clipping feature, you can manually clip the actions using the min and max functions after scaling.