Minimax 的 Alpha-beta 剪枝
我花了一整天的时间尝试实现极小极大,但没有真正理解它。现在,我想我了解极小极大值的工作原理,但不了解 alpha-beta 剪枝。
这是我对极小极大的理解:
生成所有可能的移动的列表,直到深度限制。
评估游戏场地对于底部每个节点的有利程度。
对于每个节点(从底部开始),如果层数为 max,则该节点的分数是其子节点的最高分数。如果层数为min,则该节点的得分为其子节点的最低得分。
如果您想要获得最高分数,请执行得分最高的移动;如果您想要获得最低分数
我对alpha-beta剪枝的理解是,如果父层是min并且你的节点的分数高于最低分数,那么你可以剪枝它,因为它不会影响结果。
然而,我不明白的是,如果你能计算出一个节点的分数,你将需要知道比该节点低一层的所有节点的分数(以我对极小极大的理解)。这意味着您仍将使用相同数量的 CPU 功率。
有人可以指出我做错了什么吗?这个答案(为白痴解释的Minimax)帮助我理解了minimax,但我不明白 alpha beta 修剪有什么帮助。
谢谢。
I have spent a whole day trying to implement minimax without really understanding it. Now, , I think I understand how minimax works, but not alpha-beta pruning.
This is my understanding of minimax:
Generate a list of all possible moves, up until the depth limit.
Evaluate how favorable a game field is for every node on the bottom.
For every node, (starting from the bottom), the score of that node is the highest score of it's children if the layer is max. If the layer is min, the score of that node is the lowest score of it's children.
Perform the move that has the highest score if you are trying to max it, or the lowest if you want the min score.
My understanding of alpha-beta pruning is that, if the parent layer is min and your node has a higher score than the minimum score, then you can prune it since it will not affect the result.
However, what I don't understand is, if you can work out the score of a node, you will need to know the score of all nodes on a layer lower than the node (in my understanding of minimax). Which means that you'llstill be using the same amount of CPU power.
Could anyone please point out what I am getting wrong? This answer ( Minimax explained for an idiot ) helped me understand minimax, but I don't get how alpha beta pruning would help.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
要了解 Alpha-Beta,请考虑以下情况。轮到白棋了,白棋试图使分数最大化,黑棋则试图使分数最小化。
白棋评估棋步 A、B、C,发现 C 的最佳得分为 20。现在考虑评估棋步 D 时会发生什么:
如果白棋选择 D 棋,我们需要考虑黑棋的反击。早期,我们发现黑色可以捕获白色皇后,并且由于丢失了皇后,该子树的最低分数为 5。然而,我们并没有考虑到所有黑人的反击。是否值得检查其余部分?不。
我们不在乎黑棋是否能得到低于 5 的分数,因为白棋走“C”可以将分数保持在 20。黑棋不会选择分数高于 5 的反击棋,因为他试图将分数最小化。得分,并且已经找到了得分为 5 的棋步。对于白棋来说,一旦 D 的 MIN(到目前为止为 5)低于 C(肯定是 20),棋步 C 就会优先于棋步 D。因此,我们“修剪”树的其余部分,弹出一个级别并评估白棋 E、F、G、H... 直到最后。
希望有帮助。
To understand Alpha-Beta, consider the following situation. It's Whites turn, white is trying to maximize the score, black is trying to minimize the score.
White evaluates move A,B, and C and finds the best score is 20 with C. Now consider what happens when evaluating move D:
If white selects move D, we need to consider counter-moves by black. Early on, we find black can capture the white queen, and that subtree gets a MIN score of 5 due to the lost queen. However, we have not considered all of blacks counter-moves. Is it worth checking the rest? No.
We don't care if black can get a score lower than 5 because whites move "C" could keep the score to 20. Black will not choose a counter-move with a score higher than 5 because he is trying to MINimize the score and has already found move with a score of 5. For white, move C is preferred over move D as soon as the MIN for D (5 so far) goes below that of C (20 for sure). So we "prune" the rest of the tree there, pop back up a level and evaluate white moves E,F,G,H.... to the end.
Hope that helps.
您不需要评估节点的整个子树来决定其值。 Alpha Beta 修剪使用两个动态计算的边界 alpha 和 beta 来限制节点可以采用的值。
Alpha 是通过博弈树的另一条路径保证最大玩家(无论最小玩家做什么)的最小值。该值用于在最小化级别执行截止(修剪)。当最小玩家发现最小节点的分数必然小于 alpha 时,它不需要评估该节点的任何更多选择,因为最大玩家已经有更好的移动(具有值 alpha 的移动)。
Beta 是保证最小玩家的最大值,用于在最大化级别执行截止。当最大玩家发现最大节点的分数必然大于 beta 时,它可以停止评估该节点的任何更多选择,因为最小玩家不会允许它采取这条路径,因为最小玩家已经有了一条路径这保证了 beta 值。
我写了一篇关于 Alpha Beta 剪枝的详细解释,它的伪代码和一些改进:http: //kartikkukreja.wordpress.com/2014/06/29/alphabetasearch/
You don't need to evaluate the entire subtree of a node to decide its value. Alpha Beta Pruning uses two dynamically computed bounds alpha and beta to bound the values that nodes can take.
Alpha is the minimum value that the max player is guaranteed (regardless of what the min player does) through another path through the game tree. This value is used to perform cutoffs (pruning) at the minimizing levels. When the min player has discovered that the score of a min node would necessarily be less than alpha, it need not evaluate any more choices from that node because the max player already has a better move (the one which has value alpha).
Beta is the maximum value that the min player is guaranteed and is used to perform cutoffs at the maximizing levels. When the max player has discovered that the score of a max node would necessarily be greater than beta, it can stop evaluating any more choices from that node because the min player would not allow it to take this path since the min player already has a path that guarantees a value of beta.
I've written a detailed explanation of Alpha Beta Pruning, its pseudocode and several improvements: http://kartikkukreja.wordpress.com/2014/06/29/alphabetasearch/
对mimimax(非常)简短的解释:
您(棋盘位置的评估者)可以选择下
n
步棋。您尝试所有这些并将棋盘位置交给(对手)评估员。您选择具有这些评估最小值的移动。该评估是您一开始必须评估的棋盘的评估。
对α-β-修剪的(非常)简短的解释:
您(棋盘位置的评估者)可以选择下
n
步棋。您一一尝试所有这些,并将董事会位置提供给(对手)评估员 - 但您也传递了您当前的评估(您的董事会)。m
步棋。他尝试了所有这些,并将新的棋盘位置(一一)提供给(他的对手)评估者,然后选择最大的一个。您选择具有这些评估最小值的移动。该评估是您一开始必须评估的棋盘的评估。
(Very) short explanation for mimimax:
You (the evaluator of a board position) have the choice of playing
n
moves. You try all of them and give the board positions to the (opponent) evaluator.You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.
(Very) short explanation for α-β-pruning:
You (the evaluator of a board position) have the choice of playing
n
moves. You try all of them one by one and give the board positions to the (opponent) evaluator - but you also pass along your current evaluation (of your board).m
moves. He tries all of them and gives the new board positions (one by one) to (his opponent) evaluator and then chooses the maximum one.You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.
这是一个简短的答案 - 您可以知道节点的值,而无需计算其所有子节点的精确值。
一旦我们知道从父节点玩家的角度来看,子节点不能比先前评估的兄弟节点更好,我们就可以停止评估子子树。 至少是这么糟糕。
Here's a short answer -- you can know the value of a node without computing the precise value of all its children.
As soon as we know that a child node cannot be better, from the perspective of the parent-node player, than the previously evaluated sibling nodes, we can stop evaluating the child subtree. It's at least this bad.
我认为你的问题暗示了对评估函数的误解
我不完全确定你的意思 在那里,但听起来不对。 评估函数 (EF) 通常是非常快速的、静态位置评估。这意味着它只需要查看一个位置并从中得出“结论”。 (IOW,您并不总是将分支评估为n层)
现在很多时候,评估确实是静态的,这意味着位置评估函数是完全确定性的。 这也是评估结果易于缓存的原因(因为每次评估职位时它们都是相同的)。
现在,对于例如国际象棋,通常与上述有相当多的明显/隐蔽的偏差:
根据游戏上下文,位置可能会被不同地评估(例如,确切的位置是否在游戏过程中较早出现;移动了多少步)没有典当移动/捕获发生,过路和易车机会)。解决这个问题的最常见的“技巧”是将该状态实际合并到“位置”中1
通常会为游戏的不同阶段(开局、中间、结束)选择不同的 EF );这会产生一些设计影响(更改 EF 时如何处理缓存评估?当不同层的 EF 不同时如何进行 alpha/beta 修剪?)
说实话,我不知道常见的国际象棋引擎如何解决后者(我只是在我的玩具引擎中避免使用它)
我会参考以下在线资源:
1就像“检查”/“僵局”条件一样,如果它们在评估函数之外没有特殊情况的话
I think your question hints at misunderstanding of the evaluation function
I'm not completely sure what you meant there, but it sounds wrong. The evaluation function (EF) is usually a very fast, static position evaluation. This means that it needs only look at a single position and reach a 'verdict' from that. (IOW, you don't always evaluate a branch to n plys)
Now many times, the evaluation truly is static, which means that the position evaluation function is completely deterministic. This is also the reason why the evaluation results are easily cacheable (since they will be the same each time a position is evaluated).
Now, for e.g. chess, there is usually quite a bit of overt/covert deviation from the above:
a position might be evaluated differently depending on game context (e.g. whether the exact position did occur earlier during the game; how many moves without pawn moves/captures have occurred, en passant and castling opportunity). The most common 'trick' to tackle this is by actually incorporating that state into the 'position'1
a different EF is usually selected for the different phases of the game (opening, middle, ending); this has some design impact (how to deal with cached evaluations when changing the EF? How to do alpha/beta pruning when the EF is different for different plies?)
To be honest, I'm not aware how common chess engines solve the latter (I simply avoided it for my toy engine)
I'd refer to an online resources like:
1just like the 'check'/'stalemate' conditions, if they are not special cased outside the evaluation function anyways