我可以使用什么井字棋游戏算法来确定“最佳动作”? 对于人工智能?
在井字棋实现中,我认为具有挑战性的部分是确定机器要采取的最佳动作。
可以追求哪些算法? 我正在研究从简单到复杂的实现。 我将如何解决这部分问题?
In a tic-tac-toe implementation I guess that the challenging part is to determine the best move to be played by the machine.
What are the algorithms that can pursued? I'm looking into implementations from simple to complex. How would I go about tackling this part of the problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
生成每个可能的棋盘并根据稍后在树中进一步生成的棋盘对其进行评分的强力方法不需要太多内存,特别是当您认识到 90 度棋盘旋转是多余的(就像垂直翻转一样),水平轴和对角轴。
一旦达到这一点,树形图中就会有不到 1k 的数据来描述结果,这也是计算机的最佳移动方式。
-亚当
The brute force method of generating every single possible board and scoring it based on the boards it later produces further down the tree doesn't require much memory, especially once you recognize that 90 degree board rotations are redundant, as are flips about the vertical, horizontal, and diagonal axis.
Once you get to that point, there's something like less than 1k of data in a tree graph to describe the outcome, and thus the best move for the computer.
-Adam
由于您只处理可能位置的 3x3 矩阵,因此只需编写对所有可能性的搜索就非常容易,而不会增加您的计算能力。 对于每个开放空间,计算标记该空间后的所有可能结果(我想说,递归地),然后使用最有可能获胜的举动。
优化这个确实是浪费精力。 虽然一些简单的可能是:
另一队,阻止第一队
你发现(如果有 2 个游戏
无论如何)。
(并且之前的规则没有
候选人)。
如果前面的规则为空)
Since you're only dealing with a 3x3 matrix of possible locations, it'd be pretty easy to just write a search through all possibilities without taxing you computing power. For each open space, compute through all the possible outcomes after that marking that space (recursively, I'd say), then use the move with the most possibilities of winning.
Optimizing this would be a waste of effort, really. Though some easy ones might be:
the other team, block the first one
you find (if there are 2 the games
over anyway).
(and the previous rule has no
candidates).
if the previous rules are empty)
井字棋的典型算法应如下所示:
Board:代表棋盘的九元素向量。 我们存储2(表示
空白)、3(表示 X)或 5(表示 O)。
回合:一个整数,指示将要进行的游戏的哪一步。
第一步将用 1 表示,最后一步将用 9 表示。
算法
主要算法使用三个函数。
Make2:如果棋盘的中心方块为空白,即
board[5]=2
,则返回 5。 否则,此函数返回任何非角正方形(2、4、6或8)
。Posswin(p)
:如果玩家p
在下一步行动中无法获胜,则返回0; 否则,它返回构成获胜棋步的方格数。 该功能将使程序既能获胜又能阻止对手获胜。 该函数通过检查每一行、每一列和对角线来运行。 通过将整行(或列或对角线)的每个方格的值相乘,可以检查获胜的可能性。 如果乘积为18
(3 x 3 x 2
),则X
获胜。 如果产品是50
(5 x 5 x 2
),那么 O 可以获胜。 如果找到获胜行(列或对角线),则可以确定其中的空白方块,并通过该函数返回该方块的编号。Go (n)
:在 n 格内移动。 如果 Turn 为奇数,则此过程将 board[n]
设置为 3;如果 Turn 为偶数,则将 board[n]
设置为 5。 它还逐一递增。该算法对每一步都有一个内置策略。 它使奇数
如果下
X
就走,如果下O就走偶数。我用过。 让我知道你们的感受。
A typical algo for tic-tac-toe should look like this:
Board : A nine-element vector representing the board. We store 2 (indicating
Blank), 3 (indicating X), or 5 (indicating O).
Turn: An integer indicating which move of the game about to be played.
The 1st move will be indicated by 1, last by 9.
The Algorithm
The main algorithm uses three functions.
Make2: returns 5 if the center square of the board is blank i.e. if
board[5]=2
. Otherwise, this function returns any non-corner square(2, 4, 6 or 8)
.Posswin(p)
: Returns 0 if playerp
can’t win on his next move; otherwise, it returns the number of the square that constitutes a winning move. This function will enable the program both to win and to block opponents win. This function operates by checking each of the rows, columns, and diagonals. By multiplying the values of each square together for an entire row (or column or diagonal), the possibility of a win can be checked. If the product is18
(3 x 3 x 2
), thenX
can win. If the product is50
(5 x 5 x 2
), then O can win. If a winning row (column or diagonal) is found, the blank square in it can be determined and the number of that square is returned by this function.Go (n)
: makes a move in square n. this procedure sets board[n]
to 3 if Turn is odd, or 5 if Turn is even. It also increments turn by one.The algorithm has a built-in strategy for each move. It makes the odd numbered
move if it plays
X
, the even-numbered move if it plays O.I have used it. Let me know how you guys feel.
您可以让 AI 在一些示例游戏中自行玩耍以供学习。 使用监督学习算法来帮助它前进。
You can have the AI play itself in some sample games to learn from. Use a supervised learning algorithm, to help it along.
不使用比赛场地的尝试。
注意:当你有双倍和叉子时,检查你的双倍是否给对手一个双倍。如果给了,检查你的新强制点是否包含在你的叉子列表中。
An attempt without using a play field.
Note: When you have double and forks, check if your double gives the opponent a double.if it gives, check if that your new mandatory point is included in your fork list.
用数字分数对每个方块进行排名。 如果占据了一个方格,则继续进行下一个选择(按排名降序排列)。 你需要选择一个策略(有两个主要的策略用于第一个策略,三个(我认为)用于第二个策略)。 从技术上讲,您可以对所有策略进行编程,然后随机选择一种。 这将使对手变得更难以预测。
Rank each of the squares with numeric scores. If a square is taken, move on to the next choice (sorted in descending order by rank). You're going to need to choose a strategy (there are two main ones for going first and three (I think) for second). Technically, you could just program all of the strategies and then choose one at random. That would make for a less predictable opponent.
这个答案假设您了解如何实现 P1 的完美算法,并讨论如何在对抗普通人类玩家的条件下取得胜利,因为普通人类玩家比其他人更容易犯一些错误。
如果双方都发挥最佳,比赛当然应该以平局结束。 从人类的角度来看,P1 在角球中获胜的几率要高得多。 无论出于何种心理原因,P2 都会认为在中路比赛并不那么重要,这对他们来说是不幸的,因为这是唯一无法为 P1 创造胜利的反应。
如果P2确实在中心正确阻挡,P1应该打相反的角球,因为无论出于什么心理原因,P2都会更喜欢打角球的对称性,这再次为他们带来了失败的棋盘。
对于 P1 可能采取的任何起始行动,如果随后双方都发挥最佳状态,P2 可能采取的行动将为 P1 创造胜利。 从这个意义上说,P1 可以在任何地方发挥作用。 边缘移动是最弱的,因为对此移动的最大部分可能的反应会产生平局,但仍然有一些反应将为 P1 带来胜利。
根据经验(更准确地说,根据轶事),最佳的 P1 起始动作似乎是第一个角、第二个中心和最后一个边缘。
您可以亲自或通过 GUI 添加的下一个挑战不是显示电路板。 人类绝对可以记住所有状态,但增加的挑战导致人们偏爱对称板,这种板需要更少的努力来记住,从而导致我在第一个分支中概述的错误。
我知道我在聚会上很开心。
This answer assumes you understand implementing the perfect algorithm for P1 and discusses how to achieve a win in conditions against ordinary human players, who will make some mistakes more commonly than others.
The game of course should end in a draw if both players play optimally. At a human level, P1 playing in a corner produces wins far more often. For whatever psychological reason, P2 is baited into thinking that playing in the center is not that important, which is unfortunate for them, since it's the only response that does not create a winning game for P1.
If P2 does correctly block in the center, P1 should play the opposite corner, because again, for whatever psychological reason, P2 will prefer the symmetry of playing a corner, which again produces a losing board for them.
For any move P1 may make for the starting move, there is a move P2 may make that will create a win for P1 if both players play optimally thereafter. In that sense P1 may play wherever. The edge moves are weakest in the sense that the largest fraction of possible responses to this move produce a draw, but there are still responses that will create a win for P1.
Empirically (more precisely, anecdotally) the best P1 starting moves seem to be first corner, second center, and last edge.
The next challenge you can add, in person or via a GUI, is not to display the board. A human can definitely remember all the state but the added challenge leads to a preference for symmetric boards, which take less effort to remember, leading to the mistake I outlined in the first branch.
I'm a lot of fun at parties, I know.
对最小最大算法的井字游戏适应
我们需要一个可以检查结果的函数。 该函数将检查连续的字符。 无论棋盘状态如何,结果都是 4 个选项之一:不完整、玩家 X 获胜、玩家 O 获胜或平局。
我们的 getBestMove 函数将接收棋盘的状态以及我们想要确定最佳可能移动的玩家的符号。 我们的函数将使用 getResult 函数检查所有可能的移动。 如果是胜利,则得分为 1。如果是松,得分为 -1,平局得分为 0。如果不确定,我们将使用新状态调用 getBestMove 函数董事会的和相反的符号。 由于下一步是对手的,他的胜利就是当前玩家的失败,得分将被否定。 最后可能的移动得到 1,0 或 -1 的分数,我们可以对移动进行排序,并返回得分最高的移动。
实际算法,Github,更详细地解释该过程
A Tic-tac-toe adaptation to the min max algorithem
We'll need a function that can check for the result. The function will check for a succession of chars. What ever the state of the board is, the result is one of 4 options: either Incomplete, player X won, Player O won or a tie.
Our getBestMove function will receive the state of the board, and the symbol of the player for which we want to determine the best possible move. Our function will check all possible moves with the getResult function. If it is a win it will give it a score of 1. if it's a loose it will get a score of -1, a tie will get a score of 0. If it is undetermined we will call the getBestMove function with the new state of the board and the opposite symbol. Since the next move is of the oponent, his victory is the lose of the current player, and the score will be negated. At the end possible move receives a score of either 1,0 or -1, we can sort the moves, and return the move with the highest score.
Algorithm in action, Github, Explaining the process in more details
维基百科中关于玩完美游戏(每次获胜或平局)的策略似乎是简单的伪代码:
认识到“分叉”情况是什么样子,可以按照建议以暴力方式完成。
注意:“完美”的对手是一个很好的练习,但最终不值得“对抗”。 但是,您可以更改上述优先级,以赋予对手个性弱点。
The strategy from Wikipedia for playing a perfect game (win or tie every time) seems like straightforward pseudo-code:
Recognizing what a "fork" situation looks like could be done in a brute-force manner as suggested.
Note: A "perfect" opponent is a nice exercise but ultimately not worth 'playing' against. You could, however, alter the priorities above to give characteristic weaknesses to opponent personalities.
您需要的(对于井字游戏或像国际象棋这样更困难的游戏)是minimax算法,或其稍微复杂的变体,alpha-beta 修剪。 不过,对于像井字棋这样搜索空间很小的游戏来说,普通的朴素极小极大值就足够了。
简而言之,你要做的不是寻找对你来说可能有最好结果的走法,而是寻找尽可能最坏结果的走法。 如果你假设你的对手玩得最好,你就必须假设他们会采取对你来说最差的举动,因此你必须采取最小化他们的最大收益的举动。
What you need (for tic-tac-toe or a far more difficult game like Chess) is the minimax algorithm, or its slightly more complicated variant, alpha-beta pruning. Ordinary naive minimax will do fine for a game with as small a search space as tic-tac-toe, though.
In a nutshell, what you want to do is not to search for the move that has the best possible outcome for you, but rather for the move where the worst possible outcome is as good as possible. If you assume your opponent is playing optimally, you have to assume they will take the move that is worst for you, and therefore you have to take the move that MINimises their MAXimum gain.