效用函数极小极大搜索
你好 我很困惑如何通过极小极大搜索确定效用函数 用任何可以使用极小极大搜索的游戏来解释它 基本上我问你如何确定效用函数 干杯
Hi
I'm confused how you can determine the utility functions on with a minimax search
Explain it with any game that you can use a minimax search with
Basically i am asking how do you determine the utility functions
Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
效用值只是玩家在达到游戏中的某个状态时收到的任意值。例如,在井字棋中,你的效用函数可以简单地为 1(获胜)、0(平局)或 -1(失败)。
对此运行 minmax 最多只能找到一组导致 1(胜利)的操作。
另一个例子是国际象棋(并不是说您可以在国际象棋游戏中运行极小极大)。假设你的效用函数来自某个数字,该数字基于你捕获或丢失的物品的价值
The utility value is just some arbitrary value that the player receives when arriving at a certain state in the game. For instance, in Tic-tac-toe, your utility function could simply be 1 for a win, 0 for a tie, or -1 for a loss.
Running minmax on this would at best find a set of actions that result in 1 (a win).
Another example would be chess (not that you can feasibly run minimax on a game of chess). Say your utility function comes from a certain number that is based on the value of the piece you captured or lost
确定某步棋在某种状态下的效用值与程序员的经验和他/她对游戏的了解有关。
最终状态的效用值很容易确定。例如,在井字棋中,玩家 X 的最终状态是当 X 沿对角线、垂直或水平对齐时。任何创建这种状态的移动都是最终状态,您可以创建一个函数来检查它。如果是终止状态,则该函数返回 1 或 -1。
如果您的玩家代理是玩家 X,并且在玩家 X 移动后,它确定玩家 O 将获胜,则该函数返回 -1。如果该函数确定这是自己的获胜举动,则返回 1。
如果所有单元格都被最后可能的移动占据,并且没有人获胜,则该函数返回零。
这仅适用于终端状态。评估中间状态至关重要,因为即使在 3x3 游戏中,也需要考虑很多组合。如果包括对称动作,则有 9 个!井字棋中可能的状态。对于这些中间情况,您需要提出一个评估函数,该函数返回每个状态与其他状态相关的分数。
假设我分配最终状态值为 810、0 和 -810。对于每一步,分数将为 810 /(步数)。因此,如果我在 6 步内达到最终状态,分数将为 810/6 = 135。在 9 步内,分数将为 90。以这种方式形成的评估函数将有利于更快达到最终状态的动作。但是,它仍然计算为叶节点。不过,我们需要在到达叶节点之前进行评估,但这也可以是评估函数的一部分。
假设在下面的游戏中,玩家 1 是 X。所以 X 接下来移动。以下是 X 的合法移动(行、列):
(1) 0,0
(2) 0,2
(3) 2,0
(4) 2,1
(5) 2,2
| |O| |
|O|X|X|
| | | |
每个动作的效用值应该有利于最佳动作。
在这种情况下,最好的动作是 (2) 或 (5)。因此,评估函数将为其中每一个分配一个效用值 81。对于 X 玩家来说,移动 (4) 是最糟糕的移动(并且还保证您会输掉对聪明玩家的游戏),因此该函数将为该移动分配 -9 的值。动作 (1) 和 (3) 虽然不理想,但不会让你输,所以我们可能会分配 1。
因此,当 minimax 评估这 5 个动作时,因为您的玩家 X 是最大值,所以选择将是 (2) 或 (5)。
如果我们关注选项(2)或(5),游戏将在这些之后两步进入最终状态。因此,实际上,评估函数应该比当前合法的移动提前 2 个移动来返回效用值。 (此策略遵循深度有限搜索的路线,其中您的函数在一定深度进行评估并产生效用值,而无需到达叶节点或最终状态)
现在我将回到我的第一个语句。效用值将由根据程序员对游戏的知识编码的评估函数来确定。
希望我没有让你感到困惑......
Determining the utility value of a move at a certain state has to do with the experience of the programmer and his/her knowledge of the game.
Utility values on a terminal state are kind of easy to determine. In Tic-tac-toe, for instance, a terminal state for player X is when the Xs are aligned in diagonal, vertically, or horizontally. Any move that creates such a state is a terminal state and you can create a function that checks that. If it is a terminal state, the function returns a 1 or -1.
If your player agent is player X and after player X's move it determines that player O will win, then the function returns a -1. The function returns a 1 if it determines that it is its own winning move.
If all cells are occupied with the last possible move and nobody has won, then the function returns a zero.
This is at terminal states only. It is critical to evaluate intermediate states because, even in a 3x3 game, there are lots of combinations to consider. If you include symmetrical moves you have 9! possible states in Tic-tac-toe. For those intermediate cases, you need to come up with an evaluation function that returns a score for each state as they related to other states.
Suppose that I assign the terminal state values of 810, 0, and -810. For each move, the score would be 810 / (# of moves). So if I reach a terminal state in 6 moves, the score would be 810/6 = 135. In 9 moves, the score would be 90. An evaluation function fashioned this way would favor moves that reach a terminal state faster. However, it still evaluates to a leaf node. We need to evaluate before reaching a leaf node, though, but this could also be part of an evaluation function.
Supposed that, in the game below, player 1 is X. So X moves next. The following are the legal moves (row, column) for X:
(1) 0,0
(2) 0,2
(3) 2,0
(4) 2,1
(5) 2,2
| |O| |
|O|X|X|
| | | |
The utility value for each move should favor the best moves.
The best moves, in this case, are either (2) or (5). So an evaluation function will assign a utility value of 81, for instance to each of those. Move (4) is the worst possible move for the X player (and would also warranty that you lose the game against an intelligent player) so the function would assign a value of -9 to that move. Moves (1) and (3), while not ideal, will not make you lose, so we might assign a 1.
So when minimax evaluates those 5 moves, because your player X, is max, the choice would be either (2) or (5).
If we focus on options (2) or (5), the game will be on a terminal state two moves after these. So, in reality, the evaluation function should look 2 moves ahead of the current legal moves to return the utility values. (This strategy follows the lines of depth limited search, where your function evaluates at a certain depth and produces a utility value without reaching a leaf node - or terminal state)
Now I'll circle back to my first statement. The utility value will be determined by an evaluation function coded per the programmer's knowledge of the game.
Hopefully, I'm not confusing you...