为什么要为NAN记录的keras-rl2 dqn代理的平均值和MAE

发布于 2025-01-25 14:11:30 字数 1500 浏览 5 评论 0 原文

,但是只记录了奖励和数量的步骤,并且错误指标都是NAN,

memory = SequentialMemory(limit=1000000, window_length=WINDOW_LENGTH)

processor = AtariProcessor()

policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.05,nb_steps=1000000)

dqn = DQNAgent(model=model1, nb_actions=nb_actions, policy=policy, memory=memory,
           processor=processor, nb_steps_warmup=50000, gamma=.99, 
target_model_update=10000,train_interval=4, delta_clip=1.)

adamOptimizer = adam_v2.Adam(learning_rate=0.00025)

dqn.compile(adamOptimizer ,metrics=['mae'])

env_name = 'PongNoFrameskip-v4'

weights_filename = 'dqn_{}_weights.h5f'.format(env_name)

checkpoint_weights_filename = 'dqn_' + env_name + '_weights_{step}.h5f'

log_filename = 'dqn_{}_log.json'.format(env_name)

callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=250000)]

callbacks += [FileLogger(log_filename, interval=100)]

trainLog = dqn.fit(env, callbacks=callbacks, nb_steps=1750000, log_interval=10000)    

我只让IT训练几千步为了显示,在 dqn _ {}。全是NAN,下面是JSON日志文件内容的屏幕截图

dqn代理训练日志文件

当打印回调历史记录键时,不包括损失和MAE

print(trainLog.history.keys())

输出:dict_keys(['emotive_reward','nb_episode_steps','nb_steps'])

Copied the codes over from https://github.com/keras-rl/keras-rl/blob/master/examples/dqn_atari.py but only the rewards and number of steps are logged and the error metrics are all NaN

memory = SequentialMemory(limit=1000000, window_length=WINDOW_LENGTH)

processor = AtariProcessor()

policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.05,nb_steps=1000000)

dqn = DQNAgent(model=model1, nb_actions=nb_actions, policy=policy, memory=memory,
           processor=processor, nb_steps_warmup=50000, gamma=.99, 
target_model_update=10000,train_interval=4, delta_clip=1.)

adamOptimizer = adam_v2.Adam(learning_rate=0.00025)

dqn.compile(adamOptimizer ,metrics=['mae'])

env_name = 'PongNoFrameskip-v4'

weights_filename = 'dqn_{}_weights.h5f'.format(env_name)

checkpoint_weights_filename = 'dqn_' + env_name + '_weights_{step}.h5f'

log_filename = 'dqn_{}_log.json'.format(env_name)

callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=250000)]

callbacks += [FileLogger(log_filename, interval=100)]

trainLog = dqn.fit(env, callbacks=callbacks, nb_steps=1750000, log_interval=10000)    

I only let it train for a few thousand steps just for show, and in the dqn_{}.log.json file the mean_q , the loss and the mae are all NaN, below is a screenshot of the json log file content

dqn agent training log file

and when the callbacks history keys are printed, loss and mae are not included

print(trainLog.history.keys())

output : dict_keys(['episode_reward', 'nb_episode_steps', 'nb_steps'])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

怀念你的温柔 2025-02-01 14:11:30

他们没有实施它(由于库是现在的档案库,因此可能不会实现它)。但是,我通过在第219行中的 keras-rl2/rl/core.py 修改源代码来解决此问题,或者在 ######之间添加代码。 ##

if done:
    # We are in a terminal state but the agent hasn't yet seen it. We therefore
    # perform one more forward-backward call and simply ignore the action before
    # resetting the environment. We need to pass in `terminal=False` here since
    # the *next* state, that is the state of the newly reset environment, is
    # always non-terminal by convention.
    self.forward(observation)
    self.backward(0., terminal=False)

    # This episode is finished, report and reset.

    episode_logs = {
        'episode_reward': episode_reward,
        'nb_episode_steps': episode_step,
        'nb_steps': self.step,
        #################################
        **{name:metrics[i] for i, name in enumerate(self.metrics_names)},
        'info': accumulated_info,
        #################################
    }
    callbacks.on_episode_end(episode, episode_logs)

    episode += 1
    observation = None
    episode_step = None
    episode_reward = None

我还添加了信息,以防万一。不用担心,这不会改变代理培训过程或行为,我们只是在检索其他信息。

They didn't implemented it (and probably won't since the library is now archive). However, I solve this by modifying the source code at keras-rl2/rl/core.py in line 219 or something, add the code i put in between the ########.

if done:
    # We are in a terminal state but the agent hasn't yet seen it. We therefore
    # perform one more forward-backward call and simply ignore the action before
    # resetting the environment. We need to pass in `terminal=False` here since
    # the *next* state, that is the state of the newly reset environment, is
    # always non-terminal by convention.
    self.forward(observation)
    self.backward(0., terminal=False)

    # This episode is finished, report and reset.

    episode_logs = {
        'episode_reward': episode_reward,
        'nb_episode_steps': episode_step,
        'nb_steps': self.step,
        #################################
        **{name:metrics[i] for i, name in enumerate(self.metrics_names)},
        'info': accumulated_info,
        #################################
    }
    callbacks.on_episode_end(episode, episode_logs)

    episode += 1
    observation = None
    episode_step = None
    episode_reward = None

I also added the info just in case. Don't worry, this won't modify the agents training process or behavior, we are just retrieving additional information.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文