如何在自定义环境中培训RL代理?
我创建了一个自定义空间,该空间扩展了OpenAi Gym.space。我需要这个空间,因为我需要一个可总结一个值的动作空间。使用此功能,我可以扩大输出并满足我的要求。
class ProbabilityBox(Space):
"""
Values add up to 1 and each value lies between 0 and 1
"""
def __init__(self, size=None):
assert isinstance(size, int) and size > 0
self.size = size
gym.Space.__init__(self, (), np.int64)
def sample(self):
return np.around(np.random.dirichlet(np.ones(self.size), size=1), decimals=2)[0]
def contains(self, x):
if isinstance(x, (list, tuple, np.ndarray)):
if np.sum(x) != 1:
return False
for i in x:
if i > 1 or i < 0:
return False
return True
else:
return False
def __repr__(self):
return f"ProbabilityBox({self.size})"
def __eq__(self, other):
return self.size == other.size
我正在自定义环境中的动作空间中使用此空间。我无法在稳定的baselines3
中训练该代理,因为它不支持自定义空间。
- 是否有另一种方法可以对此方案进行建模,以便我可以使用
稳定的baselines3
? - 我还可以使用哪些其他库/框架来培训支持自定义空间的RL代理?
I have created a custom space, which extends the OpenAI gym.Space. I need this space because I need an action space that sums up to a value. Using this, I can scale up the output and meet my requirement.
class ProbabilityBox(Space):
"""
Values add up to 1 and each value lies between 0 and 1
"""
def __init__(self, size=None):
assert isinstance(size, int) and size > 0
self.size = size
gym.Space.__init__(self, (), np.int64)
def sample(self):
return np.around(np.random.dirichlet(np.ones(self.size), size=1), decimals=2)[0]
def contains(self, x):
if isinstance(x, (list, tuple, np.ndarray)):
if np.sum(x) != 1:
return False
for i in x:
if i > 1 or i < 0:
return False
return True
else:
return False
def __repr__(self):
return f"ProbabilityBox({self.size})"
def __eq__(self, other):
return self.size == other.size
I am using this space in an action space in a custom environment. I am unable to train this agent in stable-baselines3
because it does not support custom spaces.
- Is there an alternate way to model this scenario so that I can work with
stable-baselines3
? - What other libraries/frameworks can I use to train an RL agent that supports custom spaces?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
稳定的基本线3不支持动作空间,不同于
dicrete / didididiscrete / box < / code>,并且在自定义操作空间中绝对没有必要,因为您的操作是(S)完全确定因此,您的神经网络是自然/实际数字或它们的矢量,因此上面提到的三个班级完全覆盖了它们
stable-baselines 3 does not support action spaces different from
Dicrete / MultiDiscrete / Box
and there is absolutely no need in custom action spaces, because your action(s) is (are) fully determined by an output of your neural network being consequently either a natural/real number or a vector of them, so that three above mentioned classes fully cover them稳定的基准确实支持自定义Envs。参见 docs 。
Stable baselines does support custom envs. See docs.