如何在 Python 中离散连续观察和动作空间？

发布于 01-19 15:53 字数 1644 浏览 5 评论 0原文

我的教授要求我在 OpenAI 的 Pendulum-V1 健身房环境中应用策略迭代方法。

Pendulum-V1 有以下环境：

观测

类型：Box(3)

Num	Observation	Min	Max
0	cos(theta)	-1.0	1.0
1	sin(theta)	-1.0	1.0
2	theta dot	-8.0	8.0

Actions

类型：Box(1)

Num	Observation	Min	Max
0	联合努力	-2.0	2.0

据我了解，策略迭代需要离散动作、离散观察和概率函数，例如 Frozen Lake OpenAI 环境。我知道有一些方法是为连续范围内的框类型数据设计的，但要求是应用“正确的”策略迭代方法并解释为什么它不起作用。

有谁有来源，知道代码仓库，或者可以帮助我如何离散操作和观察状态数据并通过策略方法应用它？我读到的所有内容都告诉我这是解决这个问题的糟糕方法，而且我似乎找不到任何人在 Pendulum-V1 上实际实现了这种方法。

原文

My professor has asked me to apply a Policy Iteration method on the Pendulum-V1 gym environment in OpenAI.

Pendulum-V1 has the following Environment:

Observation

Type: Box(3)

Num	Observation	Min	Max
0	cos(theta)	-1.0	1.0
1	sin(theta)	-1.0	1.0
2	theta dot	-8.0	8.0

Actions

Type: Box(1)

Num	Observation	Min	Max
0	Joint effort	-2.0	2.0

From my understanding, Policy Iteration requires discrete actions, discrete observations and probability functions, such as the Frozen Lake OpenAI environment. I know that there are methods designed for box type data in a continuous range but the requirement is to apply a "correct" Policy Iteration method and explain why it doesn't work.

Does anyone have a source, know a code repo, or could help me with how I would discretise the action and observation state data and apply it via the Policy Method? Everything I have read has told me this is a bad way to solve this problem and I cannot seem to find anyone who has actually implemented this method on Pendulum-V1.

分享到QQ

分享到微博