TF代理_ACTION_SPEC:如何为离散动作空间定义正确的形状?

发布于 2025-01-22 11:08:21 字数 2807 浏览 5 评论 0 原文

方案1

我的自定义环境具有以下 _ACTION_SPEC

self._action_spec = array_spec.BoundedArraySpec(
            shape=(highestIndex+1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

因此,我的操作表示为0和 gizesIndiendex 之间的简单整数值。 utils.validate_py_environment(env,情节= 5)效果很好,可以贯穿我的环境。

我想训练DQN。因此,我构建了 q_network

q_net = q_network.QNetwork(
        train_env.observation_spec(),
        train_env.action_spec(),
        fc_layer_params=fc_layer_params)

不幸的是,当我调用以下行时,我会收到以下错误:

ValueError: Network only supports action_specs with shape in [(), (1,)])
  In call to configurable 'QNetwork' (<class 'tf_agents.networks.q_network.QNetwork'>)

方案2

我尝试更改 _action> _action_spec to 的形状( )(如以下类似环境的以下教程):

self._action_spec = array_spec.BoundedArraySpec(
            shape=(), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

在这些更改之后,我可以创建 Q-network ,但是当 utils.validate_py_py_environment(envodes = 5)时,这些更改会导致以下错误) driver.run()被调用:

TypeError: iteration over a 0-d array

如何指定 _ACTION_SPEC 解决我的问题?

编辑

方案3

如果将形状更改为 shape =(1,)(NOROK2建议):

self._action_spec = array_spec.BoundedArraySpec(
            shape=(1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

我可以构建Q-network,但是当我尝试构建实际的实际下一步的代理

optimizer = tf.compat.v1.train.AdamOptimizer()

train_step_counter = tf.compat.v2.Variable(0)

tf_agent = dqn_agent.DqnAgent(
        train_env.time_step_spec(),
        train_env.action_spec(),
        q_network=q_net,
        optimizer=optimizer,
        td_errors_loss_fn = tf_agents.utils.common.element_wise_squared_loss,
        train_step_counter=train_step_counter)

tf_agent.initialize()

我会收到以下错误:

ValueError: Only scalar actions are supported now, but action spec is: BoundedTensorSpec(shape=(1,), dtype=tf.int32, name='action', minimum=array(0), maximum=array(13207))
  In call to configurable 'DqnAgent' (<class 'tf_agents.agents.dqn.dqn_agent.DqnAgent'>)

Scenario 1

My custom environment has the following _action_spec:

self._action_spec = array_spec.BoundedArraySpec(
            shape=(highestIndex+1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

Therefore my actions are represented as simple integer values between 0 and highestIndex. utils.validate_py_environment(env, episodes=5) works perfectly fine and steps through my environment.

I want to train a DQN. Therefore, I build a q_network:

q_net = q_network.QNetwork(
        train_env.observation_spec(),
        train_env.action_spec(),
        fc_layer_params=fc_layer_params)

Unfortunately, I get the following error when I call these lines:

ValueError: Network only supports action_specs with shape in [(), (1,)])
  In call to configurable 'QNetwork' (<class 'tf_agents.networks.q_network.QNetwork'>)

Scenario 2

I tried to the change the shape of _action_spec to () (like in the following tutorials for a similar environment https://www.tensorflow.org/agents/tutorials/2_environments_tutorial or https://towardsdatascience.com/tf-agents-tutorial-a63399218309):

self._action_spec = array_spec.BoundedArraySpec(
            shape=(), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

After these changes I can create the q-network but these changes lead to the following error when utils.validate_py_environment(env, episodes=5) or driver.run() is called:

TypeError: iteration over a 0-d array

How should I specify _action_spec to solve my issue?

Edit

Scenario 3

If I change the shape to shape=(1,) (suggested by norok2):

self._action_spec = array_spec.BoundedArraySpec(
            shape=(1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

I can build the q-network but when I try to build the actual agent in a next step via

optimizer = tf.compat.v1.train.AdamOptimizer()

train_step_counter = tf.compat.v2.Variable(0)

tf_agent = dqn_agent.DqnAgent(
        train_env.time_step_spec(),
        train_env.action_spec(),
        q_network=q_net,
        optimizer=optimizer,
        td_errors_loss_fn = tf_agents.utils.common.element_wise_squared_loss,
        train_step_counter=train_step_counter)

tf_agent.initialize()

I get the following error:

ValueError: Only scalar actions are supported now, but action spec is: BoundedTensorSpec(shape=(1,), dtype=tf.int32, name='action', minimum=array(0), maximum=array(13207))
  In call to configurable 'DqnAgent' (<class 'tf_agents.agents.dqn.dqn_agent.DqnAgent'>)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文