从,但是只记录了奖励和数量的步骤,并且错误指标都是NAN,
memory = SequentialMemory(limit=1000000, window_length=WINDOW_LENGTH)
processor = AtariProcessor()
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.05,nb_steps=1000000)
dqn = DQNAgent(model=model1, nb_actions=nb_actions, policy=policy, memory=memory,
processor=processor, nb_steps_warmup=50000, gamma=.99,
target_model_update=10000,train_interval=4, delta_clip=1.)
adamOptimizer = adam_v2.Adam(learning_rate=0.00025)
dqn.compile(adamOptimizer ,metrics=['mae'])
env_name = 'PongNoFrameskip-v4'
weights_filename = 'dqn_{}_weights.h5f'.format(env_name)
checkpoint_weights_filename = 'dqn_' + env_name + '_weights_{step}.h5f'
log_filename = 'dqn_{}_log.json'.format(env_name)
callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=250000)]
callbacks += [FileLogger(log_filename, interval=100)]
trainLog = dqn.fit(env, callbacks=callbacks, nb_steps=1750000, log_interval=10000)
我只让IT训练几千步为了显示,在 dqn _ {}。全是NAN,下面是JSON日志文件内容的屏幕截图
dqn代理训练日志文件
当打印回调历史记录键时,不包括损失和MAE
print(trainLog.history.keys())
输出:dict_keys(['emotive_reward','nb_episode_steps','nb_steps'])
Copied the codes over from https://github.com/keras-rl/keras-rl/blob/master/examples/dqn_atari.py but only the rewards and number of steps are logged and the error metrics are all NaN
memory = SequentialMemory(limit=1000000, window_length=WINDOW_LENGTH)
processor = AtariProcessor()
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.05,nb_steps=1000000)
dqn = DQNAgent(model=model1, nb_actions=nb_actions, policy=policy, memory=memory,
processor=processor, nb_steps_warmup=50000, gamma=.99,
target_model_update=10000,train_interval=4, delta_clip=1.)
adamOptimizer = adam_v2.Adam(learning_rate=0.00025)
dqn.compile(adamOptimizer ,metrics=['mae'])
env_name = 'PongNoFrameskip-v4'
weights_filename = 'dqn_{}_weights.h5f'.format(env_name)
checkpoint_weights_filename = 'dqn_' + env_name + '_weights_{step}.h5f'
log_filename = 'dqn_{}_log.json'.format(env_name)
callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=250000)]
callbacks += [FileLogger(log_filename, interval=100)]
trainLog = dqn.fit(env, callbacks=callbacks, nb_steps=1750000, log_interval=10000)
I only let it train for a few thousand steps just for show, and in the dqn_{}.log.json file the mean_q , the loss and the mae are all NaN, below is a screenshot of the json log file content
dqn agent training log file
and when the callbacks history keys are printed, loss and mae are not included
print(trainLog.history.keys())
output : dict_keys(['episode_reward', 'nb_episode_steps', 'nb_steps'])
发布评论
评论(1)
他们没有实施它(由于库是现在的档案库,因此可能不会实现它)。但是,我通过在第219行中的
keras-rl2/rl/core.py
修改源代码来解决此问题,或者在######之间添加代码。 ##
。我还添加了信息,以防万一。不用担心,这不会改变代理培训过程或行为,我们只是在检索其他信息。
They didn't implemented it (and probably won't since the library is now archive). However, I solve this by modifying the source code at
keras-rl2/rl/core.py
in line 219 or something, add the code i put in between the########
.I also added the info just in case. Don't worry, this won't modify the agents training process or behavior, we are just retrieving additional information.