极小极大算法
我有一个关于 Minimax 算法的简单问题:例如,对于 tic-tac-toe 游戏,如何确定每个玩家玩的效用函数?它不会自动执行此操作,是吗?我必须对游戏中的值进行硬编码,它无法自己学习它们,不是吗?
I have a simple question regarding the Minimax algorithm: for example for the tic-tac-toe game, how do I determine the utility function's for each player plays? It doesn't do that automatically, does it? I must hard-code the values in the game, it can't learn them by itself, does it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不,MiniMax 不会学习。它是暴力树搜索的更智能版本。
No, a MiniMax does not learn. It is a smarter version of a brute-force tree search.
通常您会直接实现效用函数。在这种情况下,算法不会学习如何玩游戏,它会使用您在实现中明确硬编码的信息。
然而,可以使用遗传编程(GP)或一些等效技术来自动推导效用函数。在这种情况下,您不必编码任何显式策略。相反,进化会发现自己的游戏方式。
您可以将您的 minimax 代码和 GP 代码组合成一个(可能非常慢)自适应程序,或者您可以先运行 GP,找到一个好的实用函数,然后将此函数添加到您的 minimax 代码中,就像您在任何手上一样- 编码函数。
Typically you would implement the utility function directly. In this case the algorithm would not learn how to play the game, it would use the information that you had explicitly hard-coded in the implementation.
However, it would be possible to use genetic programming (GP) or some equivalent technique to automatically derive a utility function. In this case you would not have to encode any explicit strategy. Instead the evolution would discover its own way of playing the game well.
You could either combine your minimax code and the GP code into a single (probably very slow) adaptive program, or you could run the GP first, find a good utility function and then add this function to your minimax code just as you would any hand-coded function.
Tic-Tac-Toe 足够小,可以将游戏运行到最后并指定 1 表示获胜,0 表示平局,-1 表示失败。
否则,您必须提供一个函数来启发式地确定仓位的值。例如,在国际象棋中,一个重要因素是材料的价值,但也包括谁控制中心或棋子移动的容易程度。
至于学习,你可以在位置的不同方面添加权重因素,并通过反复玩游戏来尝试优化这些因素。
Tic-Tac-Toe is small enough to run the game to the end and assign 1 for win, 0 for draw and -1 for lose.
Otherwise you have to provide a function which determines the value of a position heuristically. In chess for example a big factor is the value of the material, but also who controls the center or how easily the pieces can move.
As for learning, you can add weight factors to different aspects of the position and try to optimize those by repeatedly playing games.
如何确定每个游戏的效用函数?
仔细;-)这个文章展示了一个稍微有缺陷的评估函数(例如,它在展望可能的层数树中不够“深入”,或者无法捕捉某些董事会位置的相对强度的算法)会导致整体算法较弱(更容易失败的算法)。
它不能自己学习它们,不是吗?
不,它不能。然而,有一些方法可以让计算机了解棋盘位置的相对强度。例如,通过研究Donald Mitchie 和他的 MENACE 程序您将看到如何使用随机过程来学习棋盘,除了游戏规则外,无需任何先验知识。有趣的是,虽然这可以在计算机中实现,但由于游戏空间相对较小,而且由于各种对称性,只需要几百个彩色珠子和火柴盒。
在学习了如此酷的教计算机如何玩游戏的方法之后,我们可能不会像应用于井字游戏那样有兴趣回到 MinMax。毕竟MinMax是一种相对简单的决策树修剪方法,对于井字游戏的小游戏空间来说几乎不需要它。但是,如果我们必须;-) [回到 MinMax]...
我们可以查看与下一个游戏相关的“火柴盒”(即根本不深入),并使用与每个方块相关的珠子百分比,作为一个附加因素。然后,我们可以评估一棵传统的树,但只能进行 2 或 3 步深度移动(较浅的前瞻深度,通常会以失败或平局告终),并根据简单的 -1 (失败)、0(平局/未知)、+1(获胜)评级。然后,通过结合珠子百分比和简单评级(通过加法,当然不是乘法),我们能够以更类似于无法评估的情况下使用的方式有效地使用 MinMax游戏树结束。
底线:就 Tic-Tac-Toe 而言,当我们消除游戏的确定性本质时,MinMax 只会变得更有趣(例如帮助我们探索特定效用函数的有效性),这与简单评估完整的游戏相关。树。让游戏[数学上]变得有趣的另一种方法是与犯错误的对手一起玩......
How do determine the utility function for each play?
Carefully ;-) This article shows how a slightly flawed evaluation function (one for ex. which either doesn't go "deep" enough in looking ahead in the tree of possible plys, or one which fails to capture the relative strengh of some board positions) results in an overall weak algorithm (one that looses more often).
it can't learn them by itself, does it?
No, it doesn't. There are ways, however, to make the computer learn the relative strength of board positions. For example by looking into Donald Mitchie and his MENACE program you'll see how a stochastic process can be used to learn the board without any a priori knowledge but the rules of the game. The funny part is that while this can be implemented in computers, a few hundred colored beads and match boxes are all that is required, thanks to the relatively small size of the game space, and also thanks to various symmetries.
After learning such a cool way of teaching the computer how to play, we may not be so interested in going back to MinMax as applied to Tic-Tac-Toe. After all MinMax is a relatively simple way of pruning a decision tree, which is hardly needed with tic-tac-toe's small game space. But, if we must ;-) [go back to MinMax]...
We can look into the "matchbox" associated with the next play (i.e. not going deep at all), and use the percentage of beads associated with each square, as an additional factor. We can then evaluate a traditional tree, but only going, say 2 or 3 moves deep (a shallow look-ahead depth which would typically end in usually in losses or draws) and rate each next move on the basis of the simple -1 (loss), 0 (draw/unknown), +1 (win) rating. By then combining the beads percentage and the simple rating (by say addition, certainly not by multiplication), we are able to effectively use MinMax in a fashion that is more akin to the way it is used in cases when it is not possible to evaluate the game tree to its end.
Bottom line: In the case of Tic-Tac-Toe, MinMax only becomes more interesting (for example in helping us explore the effectiveness of a particular utility function) when we remove the deterministic nature of the game, associated with the easy evaluation the full tree. Another way of making the game [mathematically] interesting is to play with a opponent which makes mistakes...