- 概览
- 安装
- 教程
- 算法接口文档
- 简易高效的并行接口
- APIS
- FREQUENTLY ASKED QUESTIONS
- EVOKIT
- 其他
- parl.algorithms.paddle.policy_gradient
- parl.algorithms.paddle.dqn
- parl.algorithms.paddle.ddpg
- parl.algorithms.paddle.ddqn
- parl.algorithms.paddle.oac
- parl.algorithms.paddle.a2c
- parl.algorithms.paddle.qmix
- parl.algorithms.paddle.td3
- parl.algorithms.paddle.sac
- parl.algorithms.paddle.ppo
- parl.algorithms.paddle.maddpg
- parl.core.paddle.model
- parl.core.paddle.algorithm
- parl.remote.remote_decorator
- parl.core.paddle.agent
- parl.remote.client
QMIX
- class QMIX(agent_model, qmixer_model, double_q=True, gamma=0.99, lr=0.0005, clip_grad_norm=None)[源代码]¶
基类:
Algorithm
- __init__(agent_model, qmixer_model, double_q=True, gamma=0.99, lr=0.0005, clip_grad_norm=None)[源代码]¶
QMIX algorithm
- 参数:
agent_model (parl.Model) – agents’ local q network for decision making.
qmixer_model (parl.Model) – A mixing network which takes local q values as input to construct a global Q network.
double_q (bool) – Double-DQN.
gamma (float) – discounted factor for reward computation.
lr (float) – learning rate.
clip_grad_norm (None, or float) – clipped value of gradients’ global norm.
- learn(state_batch, actions_batch, reward_batch, terminated_batch, obs_batch, available_actions_batch, filled_batch)[源代码]¶
- 参数:
state_batch (paddle.Tensor) – (batch_size, T, state_shape)
actions_batch (paddle.Tensor) – (batch_size, T, n_agents)
reward_batch (paddle.Tensor) – (batch_size, T, 1)
terminated_batch (paddle.Tensor) – (batch_size, T, 1)
obs_batch (paddle.Tensor) – (batch_size, T, n_agents, obs_shape)
available_actions_batch (paddle.Tensor) – (batch_size, T, n_agents, n_actions)
filled_batch (paddle.Tensor) – (batch_size, T, 1)
- 返回:
train loss td_error (float): train TD error
- 返回类型:
loss (float)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论