TF代理_ACTION_SPEC：如何为离散动作空间定义正确的形状？

发布于 2025-01-22 11:08:21 字数 2807 浏览 5 评论 0 原文

方案1

我的自定义环境具有以下 _ACTION_SPEC ：

self._action_spec = array_spec.BoundedArraySpec(
            shape=(highestIndex+1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

因此，我的操作表示为0和 gizesIndiendex 之间的简单整数值。 utils.validate_py_environment（env，情节= 5）效果很好，可以贯穿我的环境。

我想训练DQN。因此，我构建了 q_network ：

q_net = q_network.QNetwork(
        train_env.observation_spec(),
        train_env.action_spec(),
        fc_layer_params=fc_layer_params)

不幸的是，当我调用以下行时，我会收到以下错误：

ValueError: Network only supports action_specs with shape in [(), (1,)])
  In call to configurable 'QNetwork' (<class 'tf_agents.networks.q_network.QNetwork'>)

方案2

我尝试更改 _action> _action_spec to 的形状（）（如以下类似环境的以下教程）：

self._action_spec = array_spec.BoundedArraySpec(
            shape=(), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

在这些更改之后，我可以创建 Q-network ，但是当 utils.validate_py_py_environment（envodes = 5）时，这些更改会导致以下错误）或 driver.run（）被调用：

TypeError: iteration over a 0-d array

如何指定 _ACTION_SPEC 解决我的问题？

编辑

方案3

如果将形状更改为 shape =（1，）（NOROK2建议）：

self._action_spec = array_spec.BoundedArraySpec(
            shape=(1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

我可以构建Q-network，但是当我尝试构建实际的实际下一步的代理

optimizer = tf.compat.v1.train.AdamOptimizer()

train_step_counter = tf.compat.v2.Variable(0)

tf_agent = dqn_agent.DqnAgent(
        train_env.time_step_spec(),
        train_env.action_spec(),
        q_network=q_net,
        optimizer=optimizer,
        td_errors_loss_fn = tf_agents.utils.common.element_wise_squared_loss,
        train_step_counter=train_step_counter)

tf_agent.initialize()

我会收到以下错误：

ValueError: Only scalar actions are supported now, but action spec is: BoundedTensorSpec(shape=(1,), dtype=tf.int32, name='action', minimum=array(0), maximum=array(13207))
  In call to configurable 'DqnAgent' (<class 'tf_agents.agents.dqn.dqn_agent.DqnAgent'>)

原文

Scenario 1

My custom environment has the following _action_spec:

self._action_spec = array_spec.BoundedArraySpec(
            shape=(highestIndex+1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

Therefore my actions are represented as simple integer values between 0 and highestIndex. utils.validate_py_environment(env, episodes=5) works perfectly fine and steps through my environment.

I want to train a DQN. Therefore, I build a q_network:

q_net = q_network.QNetwork(
        train_env.observation_spec(),
        train_env.action_spec(),
        fc_layer_params=fc_layer_params)

Unfortunately, I get the following error when I call these lines:

ValueError: Network only supports action_specs with shape in [(), (1,)])
  In call to configurable 'QNetwork' (<class 'tf_agents.networks.q_network.QNetwork'>)

Scenario 2

I tried to the change the shape of _action_spec to () (like in the following tutorials for a similar environment https://www.tensorflow.org/agents/tutorials/2_environments_tutorial or https://towardsdatascience.com/tf-agents-tutorial-a63399218309):

self._action_spec = array_spec.BoundedArraySpec(
            shape=(), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

After these changes I can create the q-network but these changes lead to the following error when utils.validate_py_environment(env, episodes=5) or driver.run() is called:

TypeError: iteration over a 0-d array

How should I specify _action_spec to solve my issue?

Edit

Scenario 3

If I change the shape to shape=(1,) (suggested by norok2):

self._action_spec = array_spec.BoundedArraySpec(
            shape=(1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action')

I can build the q-network but when I try to build the actual agent in a next step via

optimizer = tf.compat.v1.train.AdamOptimizer()

train_step_counter = tf.compat.v2.Variable(0)

tf_agent = dqn_agent.DqnAgent(
        train_env.time_step_spec(),
        train_env.action_spec(),
        q_network=q_net,
        optimizer=optimizer,
        td_errors_loss_fn = tf_agents.utils.common.element_wise_squared_loss,
        train_step_counter=train_step_counter)

tf_agent.initialize()

I get the following error:

ValueError: Only scalar actions are supported now, but action spec is: BoundedTensorSpec(shape=(1,), dtype=tf.int32, name='action', minimum=array(0), maximum=array(13207))
  In call to configurable 'DqnAgent' (<class 'tf_agents.agents.dqn.dqn_agent.DqnAgent'>)

分享到QQ

分享到微博