嵌套列表作为状态、元组作为动作的 Q 表表示
当我的状态是列表而操作是元组时,如何创建 Q 表?
N = 3 的状态示例
[[1], [2], [3]]
[[1], [2, 3]]
[[1], [3, 2]]
[[2], [3, 1]]
[[1, 2, 3]]
这些状态的操作示例
[[1], [2], [3]] -> (1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)
[[1], [2, 3]] -> (1, 2), (2, 0), (2, 1)
[[1], [3, 2]] -> (1, 3), (3, 0), (3, 1)
[[2], [3, 1]] -> (2, 3), (3, 0), (3, 2)
[[1, 2, 3]] -> (1, 0)
我想知道的
# q_table = {state: {action: q_value}}
但我不认为这是一个好的设计。
How can I create a Q-table, when my states are lists and actions are tuples?
Example of states for N = 3
[[1], [2], [3]]
[[1], [2, 3]]
[[1], [3, 2]]
[[2], [3, 1]]
[[1, 2, 3]]
Example of actions for those states
[[1], [2], [3]] -> (1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)
[[1], [2, 3]] -> (1, 2), (2, 0), (2, 1)
[[1], [3, 2]] -> (1, 3), (3, 0), (3, 1)
[[2], [3, 1]] -> (2, 3), (3, 0), (3, 2)
[[1, 2, 3]] -> (1, 0)
I was wondering about
# q_table = {state: {action: q_value}}
But I don't think, thats a good design.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
1. 你的州真的应该属于列表类型吗?
list
是一个可变类型。tuple
是等效的不可变类型。你在学习过程中会改变你的状态吗?我对此表示怀疑。无论如何,如果您使用
list
,则不能将其用作字典键(因为它是可变的)2. 否则,这是一个非常好的表示
在强化学习上下文中,您将希望
您的表示允许您以最小的复杂性完成这两个操作,并且非常清晰。所以这是一个很好的代表。
1. Should your states really be of type list?
list
is a mutable type.tuple
is the equivalent immutable type. Do you mutate your states during learning? I doubt it.In any case if you use
list
, you cannot use it as a dictionary key (because it is mutable)2. Otherwise this is a pretty good representation
In a reinforcement learning context, you’ll want to
Your representation allows you to do both of these with minimal complexity, and is pretty clear. So it is a good representation.
使用嵌套词典实际上是自定义表格加固学习的合理设计选择---它称为表格,原因是有原因的:)
您可以使用默认设备将Q-table初始化为某个值,例如,例如,0。
或无默认值:
IT然后,可以方便地执行更新,从而获得最大的功能,这样
我就要注意的是,如果您使用这样的Q桌子训练代理商,则每当所有值时,每次都会选择相同的操作因为这些动作是相等的(例如QF初始化时)。
最后,如果您不想使用字典,则可以将状态和操作元组映射到索引,将映射存储在字典中,并在将状态/操作传递给环境实现时使用查找。然后,您可以将它们用作2D Numpy数组的索引。
Using a nested dictionary is actually a reasonable design choice for custom tabular reinforcement learning---it's called tabular for a reason :)
You could use defaultdict to initialize the q-table to a certain value, e.g., 0.
or without defaultdict:
It is then convenient to perform updates by getting the max by something like so
One thing I'd just watch out for is that if you train an agent using a q-table like this, it will pick the same action each time if all the values for the actions are equal (such as when the qf is initialized).
Finally, if you don't want to use dictionaries, you can just map state and action tuples to indices, store the mapping in a dictionary, and use a lookup when you pass the state/action to your environment implementation. You can then just use them as indices of a 2d numpy array.