Q-Learning 的泛化函数

发布于 2024-08-07 03:17:20 字数 291 浏览 19 评论 0原文

我必须使用 Q Learning 做一些工作，关于一个必须在房子周围移动家具的人（基本上就是这样）。如果房子足够小，我可以有一个代表行动/奖励的矩阵，但随着房子大小变大，这还不够。所以我必须使用某种泛化函数来代替。我的老师建议我不只使用一个，而是使用多个，这样我就可以比较它们等等。大家推荐什么？

我听说对于这种情况，人们正在使用支持向量机，还有神经网络。我并不真正在这个领域，所以我无法判断。我过去有过一些神经网络方面的经验，但 SVM 似乎更难掌握。我还应该寻找其他方法吗？我知道肯定有无数这样的人，但我需要一些东西才能开始。

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吃素的狼 2024-08-14 03:17:20

正如术语的复习一样，在 Q 学习中，您正在尝试学习 Q 函数，这取决于状态和动作：

Q(S,A) = ????

大多数课程中教授的 Q 学习的标准版本告诉您，您对于每个 S 和A，您需要学习表中的单独值并告诉您如何执行贝尔曼更新以便收敛到最优值。

现在，假设您使用不同的函数逼近器而不是表。例如，让我们尝试线性函数。考虑你的 (S,A) 对并考虑可以从中提取的一系列特征。特征的一个示例是“我是否靠近一堵墙”，另一个示例是“该操作是否会将对象放置在墙旁边”等。对这些特征进行编号 f1(S,A)、f2(S,A)...。 ..

现在，尝试将 Q 函数学习为这些特征的线性函数。

Q(S,A) = w1 * f1(S,A) + w2*f2(S,A) ... + wN*fN(S,A)

您应该如何学习权重 w？好吧，既然是作业，那我就让你自己想一想吧。

然而，作为提示，假设您有 K 个可能的状态，每个状态有 M 个可能的操作。假设您定义了 K*M 个特征，其中每个特征都表明您是否处于特定状态并且将采取特定操作。现在

Q(S,A) = w11 * (S==1 && A == 1) + w12 * (S == 1 && A == 2) + w21 * (S==2 && A==3) ...

，请注意，对于任何状态/动作对，只有一个特征将为 1，其余特征将为 0，因此 Q(S,A) 将等于相应的 w，并且您本质上是在学习一张表。因此，您可以将标准的表 Q 学习视为使用这些线性函数进行学习的特例。因此，想想普通的 Q 学习算法会做什么，以及你应该做什么。

希望您能找到一小部分特征（比 K*M 少得多），这将使您能够很好地代表您的空间。

Just as a refresher of terminology, in Q-learning, you are trying to learn the Q-functions, which depend on the state and action:

Q(S,A) = ????

The standard version of Q-learning as taught in most classes tells you that you for each S and A, you need to learn a separate value in a table and tells you how to perform Bellman updates in order to converge to the optimal values.

Now, lets say that instead of table you use a different function approximator. For example, lets try linear functions. Take your (S,A) pair and think of a bunch of features you can extract from them. One example of a feature is "Am I next to a wall," another is "Will the action place the object next to a wall," etc. Number these features f1(S,A), f2(S,A), ...

Now, try to learn the Q function as a linear function of those features

Q(S,A) = w1 * f1(S,A) + w2*f2(S,A) ... + wN*fN(S,A)

How should you learn the weights w? Well, since this is a homework, I'll let you think about it on your own.

However, as a hint, lets say that you have K possible states and M possible actions in each state. Lets say you define K*M features, each of which is an indicator of whether you are in a particular state and are going to take a particular action. So

Q(S,A) = w11 * (S==1 && A == 1) + w12 * (S == 1 && A == 2) + w21 * (S==2 && A==3) ...

Now, notice that for any state/action pair, only one feature will be 1 and the rest will be 0, so Q(S,A) will be equal to the corresponding w and you are essentially learning a table. So, you can think of the standard, table Q-learning as a special case of learning with these linear functions. So, think of what the normal Q-learning algorithm does, and what you should do.

Hopefully you can find a small basis of features, much fewer than K*M, that will allow you to represent your space well.

回复收藏 0 原文

~没有更多了~