- 概览
- 安装
- 教程
- 算法接口文档
- 简易高效的并行接口
- APIS
- FREQUENTLY ASKED QUESTIONS
- EVOKIT
- 其他
- parl.algorithms.paddle.policy_gradient
- parl.algorithms.paddle.dqn
- parl.algorithms.paddle.ddpg
- parl.algorithms.paddle.ddqn
- parl.algorithms.paddle.oac
- parl.algorithms.paddle.a2c
- parl.algorithms.paddle.qmix
- parl.algorithms.paddle.td3
- parl.algorithms.paddle.sac
- parl.algorithms.paddle.ppo
- parl.algorithms.paddle.maddpg
- parl.core.paddle.model
- parl.core.paddle.algorithm
- parl.remote.remote_decorator
- parl.core.paddle.agent
- parl.remote.client
MADDPG
- class MADDPG(model, agent_index=None, act_space=None, gamma=None, tau=None, actor_lr=None, critic_lr=None)[源代码]¶
基类:
Algorithm
- Q(obs_n, act_n, use_target_model=False)[源代码]¶
use the value model to predict Q values
- 参数:
obs_n (list of paddle tensor) – all agents’ observation, len(agent’s num) + shape([B] + shape of obs_n)
act_n (list of paddle tensor) – all agents’ action, len(agent’s num) + shape([B] + shape of act_n)
use_target_model (bool) – use target_model or not
- 返回:
Q value of this agent, shape([B])
- 返回类型:
Q (paddle tensor)
- __init__(model, agent_index=None, act_space=None, gamma=None, tau=None, actor_lr=None, critic_lr=None)[源代码]¶
MADDPG algorithm
- 参数:
model (parl.Model) – forward network of actor and critic. The function get_actor_params() of model should be implemented.
agent_index (int) – index of agent, in multiagent env
act_space (list) – action_space, gym space
gamma (float) – discounted factor for reward computation.
tau (float) – decay coefficient when updating the weights of self.target_model with self.model
critic_lr (float) – learning rate of the critic model
actor_lr (float) – learning rate of the actor model
- predict(obs)[源代码]¶
use the policy model to predict actions
- 参数:
obs (paddle tensor) – observation, shape([B] + shape of obs_n[agent_index])
- 返回:
- action, shape([B] + shape of act_n[agent_index]),
noted that in the discrete case we take the argmax along the last axis as action
- 返回类型:
act (paddle tensor)
- sample(obs, use_target_model=False)[源代码]¶
use the policy model to sample actions
- 参数:
obs (paddle tensor) – observation, shape([B] + shape of obs_n[agent_index])
use_target_model (bool) – use target_model or not
- 返回:
- action, shape([B] + shape of act_n[agent_index]),
noted that in the discrete case we take the argmax along the last axis as action
- 返回类型:
act (paddle tensor)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论