文章来源于网络收集而来，版权归原创者所有，如有侵权请及时联系！

TD3

发布于 2024-06-23 17:58:49 字数 2370 浏览 0 评论 0 收藏 0

class TD3(model, gamma=None, tau=None, actor_lr=None, critic_lr=None, policy_noise=0.2, noise_clip=0.5, policy_freq=2)[源代码]¶

__init__(model, gamma=None, tau=None, actor_lr=None, critic_lr=None, policy_noise=0.2, noise_clip=0.5, policy_freq=2)[源代码]¶

TD3 algorithm

参数:

model (parl.Model) – forward network of actor and critic.
gamma (float) – discounted factor for reward computation
tau (float) – decay coefficient when updating the weights of self.target_model with self.model
actor_lr (float) – learning rate of the actor model
critic_lr (float) – learning rate of the critic model
policy_noise (float) – noise added to target policy during critic update
noise_clip (float) – range to clip target policy noise
policy_freq (int) – frequency of delayed policy updates

learn(obs, action, reward, next_obs, terminal)[源代码]¶: Define the loss function and create an optimizer to minize the loss.

predict(obs)[源代码]¶: Refine the predicting process, e.g,. use the policy model to predict actions.

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据