Action Clipping and Scaling in TD3 in Reinforcement Learning

Illustration
year_fhsjh - 2021-04-20T13:47:26+00:00
Question: Action Clipping and Scaling in TD3 in Reinforcement Learning

I am trying to tune my TD3 agent to solve my custom environment. The environment has two actions in the following range: the first one in [0 10] and the second one in [0 2*PI) (rlNumericSpace). I am following this example architecture--- https://in.mathworks.com/help/reinforcement-learning/ug/train-td3-agent-for-pmsm-control.html Now I have the following questions. Since tanh is [-1 1], should I use the scaling layer at the actor network's end? maybe with the following values scalingLayer('Name','ActorScaling1','Scale',[5;pi],'Bias',[5;pi])]; 2. How to setup Exploration noise and Target policy noise? I mean, what should be their variance values? Well, not precisely tuned, but a competent range given I have more than one action and the provided action range is not in [-1 1] ? 3. How do I clip those values to fit inside the action bound? I dont see any such option in rlTD3AgentOptions I see all the TD3 examples (and most RL examples in general) action's range is b/n [-1 1]. I am confused about modifying the parameters when the action space is not within [-1 1], like in my case.

Expert Answer

Profile picture of Neeta Dsouza Neeta Dsouza answered . 2025-11-20

Great questions! Let's tackle each of your queries step-by-step:

1. Scaling Layer for Actor Network

Yes, you should use the scaling layer at the end of the actor network to scale the actions to the desired range. The values you provided look correct:

scalingLayer('Name','ActorScaling1','Scale',[5;pi],'Bias',[5;pi])

This will scale the first action to the range [0, 10] and the second action to the range [0, 2π).

2. Exploration Noise and Target Policy Noise

For TD3, it's important to add noise to the actions to encourage exploration. Given your action space ranges are not within [-1, 1], you'll need to adjust the noise accordingly:

  • Exploration Noise: This noise is added to the actions during training to explore the action space. A common approach is to use a Gaussian noise with a small standard deviation relative to the action range. For your case, you might start with something like:explorationNoiseVariance = [1 0.1]; % Variances for the two actions
    explorationNoise = sqrt(explorationNoiseVariance) .* randn(size(action));

  • Target Policy Noise: This noise is added to the actions taken by the target policy network. It's typically smaller than the exploration noise to provide smooth updates. You might start with:
  • targetPolicyNoiseVariance = [0.5 0.05]; % Variances for the two actions
    targetPolicyNoise = sqrt(targetPolicyNoiseVariance) .* randn(size(action));
  • 3. Clipping Action Values

    While rlTD3AgentOptions doesn't have a built-in action clipping feature, you can manually clip the actions using the min and max functions after scaling.


Not satisfied with the answer ?? ASK NOW

Get a Free Consultation or a Sample Assignment Review!