如何在自定义环境中培训RL代理？

发布于 2025-02-10 13:46:58 字数 1194 浏览 4 评论 0原文

我创建了一个自定义空间，该空间扩展了OpenAi Gym.space。我需要这个空间，因为我需要一个可总结一个值的动作空间。使用此功能，我可以扩大输出并满足我的要求。

class ProbabilityBox(Space):
    """
        Values add up to 1 and each value lies between 0 and 1
    """
    def __init__(self, size=None):
        assert isinstance(size, int) and size > 0
        self.size = size
        gym.Space.__init__(self, (), np.int64)

    def sample(self):
        return np.around(np.random.dirichlet(np.ones(self.size), size=1), decimals=2)[0]

    def contains(self, x):
        if isinstance(x, (list, tuple, np.ndarray)):
            if np.sum(x) != 1:
                return False
            
            for i in x:
                if i > 1 or i < 0:
                    return False
            
            return True
        else:
            return False

    def __repr__(self):
        return f"ProbabilityBox({self.size})"

    def __eq__(self, other):
        return self.size == other.size

我正在自定义环境中的动作空间中使用此空间。我无法在稳定的baselines3中训练该代理，因为它不支持自定义空间。

是否有另一种方法可以对此方案进行建模，以便我可以使用稳定的baselines3？
我还可以使用哪些其他库/框架来培训支持自定义空间的RL代理？

原文

I have created a custom space, which extends the OpenAI gym.Space. I need this space because I need an action space that sums up to a value. Using this, I can scale up the output and meet my requirement.

class ProbabilityBox(Space):
    """
        Values add up to 1 and each value lies between 0 and 1
    """
    def __init__(self, size=None):
        assert isinstance(size, int) and size > 0
        self.size = size
        gym.Space.__init__(self, (), np.int64)

    def sample(self):
        return np.around(np.random.dirichlet(np.ones(self.size), size=1), decimals=2)[0]

    def contains(self, x):
        if isinstance(x, (list, tuple, np.ndarray)):
            if np.sum(x) != 1:
                return False
            
            for i in x:
                if i > 1 or i < 0:
                    return False
            
            return True
        else:
            return False

    def __repr__(self):
        return f"ProbabilityBox({self.size})"

    def __eq__(self, other):
        return self.size == other.size

I am using this space in an action space in a custom environment. I am unable to train this agent in stable-baselines3 because it does not support custom spaces.

Is there an alternate way to model this scenario so that I can work with stable-baselines3?
What other libraries/frameworks can I use to train an RL agent that supports custom spaces?

分享到QQ

分享到微博