如何为游戏创建一个好的评价函数?
有时我会编写程序来玩棋盘游戏。基本策略是标准的 alpha-beta 修剪或类似的搜索,有时通过常见的残局或开局方法进行增强。我主要玩过国际象棋的变体,所以当需要选择我的评估函数时,我使用基本的国际象棋评估函数。
然而,现在我正在编写一个程序来玩一个全新的棋盘游戏。如何选择一个好的甚至像样的评估函数?
主要的挑战是棋盘上总是有相同的棋子,所以通常的物质函数不会根据位置而改变,而且游戏玩过的次数还不到一千次左右,所以人类不一定玩得足够多还没有给出见解。 (PS。我考虑过 MoGo 方法,但随机游戏不太可能终止。)
游戏详细信息:游戏在 10×10 的棋盘上进行,每面固定有 6 个棋子。这些棋子有一定的运动规则,并以一定的方式相互作用,但没有一个棋子被捕获。游戏的目标是在棋盘上的某些特殊方格中拥有足够的棋子。计算机程序的目标是提供一个与当前人类玩家具有竞争力或更好的玩家。
I write programs to play board game variants sometimes. The basic strategy is standard alpha-beta pruning or similar searches, sometimes augmented by the usual approaches to endgames or openings. I've mostly played around with chess variants, so when it comes time to pick my evaluation function, I use a basic chess evaluation function.
However, now I am writing a program to play a completely new board game. How do I choose a good or even decent evaluation function?
The main challenges are that the same pieces are always on the board, so a usual material function won't change based on position, and the game has been played less than a thousand times or so, so humans don't necessarily play it enough well yet to give insight. (PS. I considered a MoGo approach, but random games aren't likely to terminate.)
Game details: The game is played on a 10-by-10 board with a fixed six pieces per side. The pieces have certain movement rules, and interact in certain ways, but no piece is ever captured. The goal of the game is to have enough of your pieces in certain special squares on the board. The goal of the computer program is to provide a player which is competitive with or better than current human players.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我将从一些基础知识开始,稍后再转向更难的内容。
基本代理和测试框架
无论您采用哪种方法,都需要从非常简单和愚蠢的东西开始。对于愚蠢的智能体来说,最好的方法是随机的(生成所有可能的动作,随机选择一个)。这将作为比较所有其他代理的起点。您需要一个强大的比较框架。需要不同的代理,允许在它们之间玩一定数量的游戏并返回性能矩阵。根据结果,您可以计算每个代理的适合度。例如,您的函数
tournament(agent1, agent2, agent3, 500)
将在每对代理之间进行 500 场比赛(进行第一场/第二场比赛),并返回如下内容:例如,这里我使用 2 分对于胜利,平局计分功能得 1 分,最后将所有内容相加以找到适合度。该表立即告诉我,
agent3
是最好的,agent1
与agent2
没有什么不同。因此,一旦设置了这两件重要的事情,您就可以尝试评估函数了。
让我们从选择特征开始
首先,您需要创建
不是一个糟糕的
评估函数。我的意思是这个函数应该正确识别 3 个重要方面(赢/平局/输)。这听起来很明显,但我见过大量的机器人,其中创建者无法正确设置这 3 个方面。然后你利用人类的聪明才智来找到游戏状态的一些特征。首先要做的就是与游戏专家交谈并询问他如何获得该位置。
如果您没有专家,或者您只是在 5 分钟前创建了游戏规则,请不要低估人类搜索模式的能力。即使玩了几局之后,聪明的人也可以给你他应该怎么玩的想法(这并不意味着他可以实现这些想法)。使用这些想法作为特征。
此时您并不真正需要知道这些功能如何影响游戏。特征示例:棋子的价值、棋子的移动性、重要位置的控制、安全性、可能的移动总数、结束的接近程度。
对这些功能进行编码并分别使用它们来查看哪些效果最好(不要急于放弃那些本身性能不合理的功能,它们与其他功能结合使用可能会有所帮助)后,您就可以开始尝试了组合。
通过组合和加权简单特征来构建更好的评估。有几种标准方法。
根据您的功能的各种组合创建一个超级功能。它可以是线性的
eval = f_1 * a_1 + ... f_n * a_n
(f_i
特征,a_i
系数),但它可以是任何东西。然后为这个评估函数实例化许多具有绝对随机权重的代理,并使用遗传算法让它们相互竞争。使用测试框架比较结果,丢弃一些明显的失败者并改变一些获胜者。继续相同的过程。 (这是一个粗略的轮廓,请阅读有关 GA 的更多信息)使用神经网络的反向传播思想来反向传播游戏结束时的误差,以更新网络的权重。您可以通过 西洋双陆棋(我没有写过类似的东西,很抱歉简短)。
无需评估函数即可工作!对于只听说过 minimax/alpha-beta 的人来说,这可能听起来很疯狂,但有些方法根本不需要评估。其中之一称为蒙特卡罗树搜索,顾名思义,它使用蒙特卡罗很多随机的(它不应该是随机的,它可以使用你以前的好代理)游戏来生成一棵树。这本身就是一个很大的话题,所以我会给你我的非常高层次的解释。你从一个根开始,创建你的边界,并尝试扩展它。一旦你扩展了某些东西,你就可以随机地去到叶子。从叶子中获取结果,然后反向传播结果。多次执行此操作,并收集有关当前边界的每个子项的统计数据。选择最好的一个。那里有一个重要的理论,涉及如何在探索和利用之间取得平衡,UCT(置信上限算法)是值得一读的好东西。
I will start with some basics and move to harder stuff later.
Basic agent and a testing framework
No matter what approach you take you need to start with something really simple and dumb. The best approach for a dumb agent is a random one (generate all possible moves, select one at random). This will serve as a starting point to compare all your other agents. You need a strong framework for comparison. Something that takes various agents, allows to play some number of games between them and returns the matrix of the performance. Based on the results, you calculate the fitness for each agent. For example your function
tournament(agent1, agent2, agent3, 500)
will play 500 games between each pair of agent (playing the first/second) and returns you something like:Here for example I use 2 points for a win, 1 point for draw scoring function, and at the end just summing everything to find the fitness. This table immediately tells me that
agent3
is the best, andagent1
is not really different fromagent2
.So once these two important things are set up you are ready to experiment with your evaluation functions.
Let's start with selecting features
First of all you need to create
not a terrible
evaluation function. By this I mean that this function should correctly identify 3 important aspects (win/draw/loss). This sounds obvious, but I have seen significant amount of bots, where the creators were not able to correctly set up these 3 aspects.Then you use your human ingenuity to find some features of the game state. The first thing to do is to speak with a game expert and ask him how he access the position.
If you do not have the expert, or you even just created the rules of your game 5 minutes ago, do not underestimate the human's ability to search for patters. Even after playing a couple of games, a smart person can give you ideas how he should have played (it does not mean that he can implement the ideas). Use these ideas as features.
At this point you do not really need to know how these features affect the game. Example of features: value of the pieces, pieces mobility, control of important positions, safety, total number of possible moves, closeness to a finish.
After you coded up these features and used them separately to see what works best (do not hurry up to discard features that do not perform reasonable by itself, they might be helpful in conjunction with others), you are ready to experiment with combinations.
Building better evaluations by combining and weighting simple features. There are a couple of standard approaches.
Create an uber function based on various combinations of your features. It can be linear
eval = f_1 * a_1 + ... f_n * a_n
(f_i
features,a_i
coefficients), but it can be anything. Then instantiate many agents with absolutely random weights for this evaluation function and use genetic algorithm to play them agains each other. Compare the results using the testing framework, discard a couple of clear losers and mutate a couple of winners. Continue the same process. (This is a rough outline, read more about GA)Use the back-propagation idea from a neural networks to back propagate the error from the end of the game to update the weights of your network. You can read more how it was done with backgammon (I have not written anything similar, so sorry for the shortness).
You can work without evaluation function! This might sound insane for a person who only heard about minimax/alpha-beta, but there are methods which do not require an evaluation at all. One of them is called Monte Carlo Tree Search and as a Monte Carlo in a name suggests it uses a lot of random (it should not be random, it can use your previous good agents) game plays to generate a tree. This is a huge topic by itself, so I will give you mine really high-level explanation. You start with a root, create your frontier, which you try to expand. Once you expand something, you just randomly go to the leaf. Getting the result from the leaf, you backpropagate the result. Do this many many times, and collect the statistics about each child of the current frontier. Select the best one. There is significant theory there which relates to how do you balance between exploration and exploitation and a good thing to read there is UCT (Upper Confidence Bound algorithm)
为您的评估函数找到一些候选函数,例如移动性(可能的移动数量)减去对手的移动性,然后尝试找到每个指标的最佳权重。遗传算法似乎对于优化评估函数中的权重非常有效。
创建具有随机权重的群体,以有限的深度和回合相互对抗,用获胜者的随机组合替换失败者,洗牌并重复,打印出每一代后的群体平均值。让它运行,直到您对结果感到满意,或者直到您发现需要调整某些指标的范围,然后重试(如果某个指标的最佳值可能超出了您的初始范围)。
后期编辑:一种更被接受、研究、理解的方法,我当时并不知道,叫做“差异进化”。后代是由 3 个父母而不是 2 个父母创建的,这样就避免了过早收敛于平均值的问题。
Find a few candidates for your evaluation function, like mobility (# of possible moves) minus opponent's mobility, then try to find the optimal weight for each metric. Genetic algorithms seem to work pretty well for optimizing weights in an evaluation function.
Create a population with random weights, fight them against each other with a limited depth and turns, replace the losers with random combinations from the winners, shuffle, and repeat, printing out the population average after every generation. Let it run until you're satisfied with the result, or until you see a need to adjust the range for some of the metrics and try again, if it appears that the optimal value for one metric might be outside your initial range.
Late edit: A more accepted, studied, understood approach that I didn't know at the time is something called "Differential Evolution". Offspring are created from 3 parents instead of 2, in such a way that avoids the problem of premature convergence towards the average.
我会研究监督机器学习算法,例如强化学习。查看棋盘游戏中的强化学习。我认为这会给你一些很好的研究方向。
另外,请查看基于强化学习的黑白棋策略获取( PDF 链接)在给定游戏规则的情况下,可以学习良好的“支付函数”。这与 TD-Gammon 密切相关...
I would look at a supervised machine learning algorithm such as reinforcement learning. Check out Reinforcement learning in board games. I think that will give you some good directions to look into.
Also, check out Strategy Acquisition for the Game Othello Based on Reinforcement Learning (PDF link) where given the rules of the game, a good "payoff function" can be learned. This is closely related to TD-Gammon ...
如果还没有人理解游戏,你就无法获得像样的评估函数。不要告诉我,具有材料数量的标准 alpha-beta 对于国际象棋或其变体来说是好的,甚至是不错的(也许失败者的国际象棋是一个例外)。
您可以尝试带有反馈的神经网络或类似的机器学习算法,但它们通常会很糟糕,直到它们经过大量训练,而在这种情况下可能无法进行训练。即使如此,如果它们不糟糕,你也无法从它们那里获得知识。
我认为,尽可能地理解游戏是不可能的,对于初学者来说,将未知数保留为随机的评估函数(或者只是将其排除在外,直到未知数变得更好)。
当然,如果您愿意分享有关游戏的更多信息,您可以从社区获得更好的想法。
If nobody understands the game yet, there's no way you can get a decent evaluation function. Don't tell me that standard alpha-beta with material count is good or even decent for chess or its variants (maybe losers' chess is an exception).
You could try neural networks with feedback or similar machine learning algorithms but they usually suck until they have tons of training, which in this case is probably not available. And even then, if they don't suck, you can't gain knowledge from them.
I think there's no way short of understanding the game the best you can and, for starters, leave the unknowns as random on the evaluation function (or just out of the picture until the unknowns become better known).
Of course, if you'd share more info about the game you could get better ideas from the community.
据我了解,您需要一个良好的静态评估函数来在最小-最大树的叶子上使用。如果是这样,最好记住这个静态评估函数的目的是提供一个关于该棋盘对于计算机玩家来说有多好的评级。 如此
f(board1) > 也是 f(board2)
那么 board1 一定比 board2 对计算机更好(它最终获胜的可能性更大)。当然,没有静态函数对于所有板来说都是完全正确的。
所以,你说“游戏的目标是在棋盘上的某些特殊方格中拥有足够的棋子”,因此 f(board) 的第一次尝试只是简单地计算计算机在这些棋盘上拥有的棋子数量。特殊的正方形。然后你可以更加巧妙地处理它。
在不了解游戏细节的情况下,不可能给出更好的猜测。如果您给我们游戏规则,我相信 stackoverflow 用户将能够对此类功能提出大量原创想法。
As I understand it, you want a good static evaluation function to use at the leaves of your min-max tree. If so, it is best to remember that the purpose of this static evaluation function is to provide a rating as to how good that board is for the computer player. So is
f(board1) > f(board2)
then it must be true that board1 is better for the computer (it is more likely to eventually win) than in board2. Of course, no static function is ever completely correct for all boards.
So, you say that "The goal of the game is to have enough of your pieces in certain special squares on the board", so a first stab at f(board) would simply be to count the number of pieces the computer has on those special squares. You can then finesse it more.
Without knowing the specifics of the game its impossible to give better guesses. If you gave us the game rules I am sure the stackoverflow users would be able to come with tons of original ideas for such functions.
虽然您可以使用各种机器学习方法来提出评估函数(TD-Learning,用于 gnubackgammon 等项目中就是一个这样的例子),但结果绝对取决于游戏本身。对于双陆棋来说,它的效果非常好,因为游戏的随机性(掷骰子)迫使学习者探索他们可能不想做的领域。如果没有这样一个关键组件,您最终可能会得到一个对其自身有利但对其他人不利的评估函数。
由于实质性差异可能不适用,因此流动性的概念重要吗?即您有多少种可能的移动方式?控制棋盘的某个区域通常比不控制更好吗?与玩游戏的人交谈以找出一些线索。
虽然最好拥有尽可能好的评估函数,但您还需要调整搜索算法,以便可以尽可能深入搜索。有时,这实际上更值得关注,因为具有 medicore 评估函数的深度搜索器可以胜过具有良好评估函数的浅层搜索。这一切都取决于域。 (例如,gnubackgammon 通过 1 层搜索来玩专家游戏)
您还可以使用其他技术来提高搜索质量,最重要的是,使用转置表来缓存搜索结果,以实现良好的前向修剪。
我强烈建议您查看这些幻灯片。
While you could use various machine learning methods to come up with an evaluation function (TD-Learning, used in such projects such as gnubackgammon, is one such example), the results are definitely dependent on the game itself. For backgammon, it works really well, because the stochastic nature of the game (rolling dice) forces the learner to explore territory it may not want to do. Without such a crucial component, you will probably end up with an evaluation function which is good against itself, but not against others.
Since material difference may not be applicable, is the concept of mobility important -- i.e. how many possible moves you have available? Is controlling a certain area of the board usually better than not? Talk to the people who play the game to find out some clues.
While it's preferable to have as good of an evaluation function as you can, you also need to tune your search algorithm so you can search as deeply as possible. Sometimes, this is actually more of a concern, since a deep searcher with a medicore evaluation function can outplay shallow searches with a good evaluation function. It all depends on the domain. (gnubackgammon plays an expert game with a 1-ply search, for example)
There are other techniques you can use to improve the quality of your search, most importantly, to have a transposition table to cache search results to have sound forward pruning.
I highly recommend looking over these slides.
您还需要谨慎选择。如果您的算法与实际值没有已知的关系,标准人工智能功能将无法正常工作。为了有效,你的评估函数或启发式必须与实际值相同或始终低于实际值,否则它将以一种奇怪的方式指导你的决策(这可能会支持国际象棋,尽管我认为标准分很好)。
我通常做的就是找出我的能力和需求。对于某些游戏(例如推箱子),我使用了将一个盒子(单独)从当前位置移动到任何目标位置所需的最少移动次数。这不是所需移动次数的准确答案,但我认为这是一个非常好的启发式方法,因为它永远不会高估,并且可以为整个棋盘预先计算。当对棋盘的分数进行求和时,它只是每个当前框位置的值的总和。
在我编写的一个进化群体狩猎和群体防御的人工生命模拟中,我使用的评分系统只是为了指导进化,而不是进行任何修剪。我为每个生物的诞生赋予一分。他们一生中每消耗一分能量,我就给他们额外一分。然后我用他们这一代人的分数总和来确定每个人繁殖的可能性。就我而言,我只是使用了他们获得的这一代总积分的比例。如果我想进化出擅长躲避的生物,我就会因为吃掉它们而得分。
您还应该注意,您的功能并不是太难实现的目标。如果你想进化一些东西,你需要确保解决方案空间有一个合适的斜率。你想要引导进化朝着一个方向发展,而不是仅仅在随机命中时就宣布胜利。
如果不了解更多关于你的游戏的信息,我很难告诉你如何构建一个函数。是否有明确的价值来表明胜利或失败?您有办法估算缩小差距的最低成本吗?
如果您提供更多信息,我很乐意尝试提供更多见解。还有很多关于该主题的优秀书籍。
雅各布
You also need to be careful on your choice. If your algorithm does not have a known relation to the actual value, the standard AI functions will not work properly. To be valid, your evaluation function, or heuristic has to be the same as, or below the actual value consistently or it will guide your decisions in an odd way (which one could argue for chess, even though I think the standard points are fine).
What I typically do is find out what is capable and what is required. For some games, like sokoban, I have used the minimum number of box moves required to get one box (in isolation) from its current location to any of the goal locations. This is not an accurate answer for the number of required moves, but I think it is a pretty good heuristic since it can never overestimate and it can be pre-calculated for the entire board. When summing the score for a board it is just the sum of the values for each current box location.
In an artificial life simulation that I wrote to evolve pack hunting and pack defense, the scoring system that I used was only to guide evolution and not to perform any pruning. I gave each creature one point for being born. For each point of energy that they consumed in their life, I gave them one additional point. I then used the sum of the points of their generation to determine how likely each was to reproduce. In my case, I simply used the proportion of the total points of their generation that they had acquired. If I had wanted to evolve creatures that were great at evading, I would have scored down for getting points eaten off of them.
You should also be careful that your function is not too hard of a goal to hit. If you are trying to evolve something, you want to make sure the solution space has a decent slope. You want to guide the evolution in a direction, not just declare a victory if it happens to randomly hit.
Without knowing more about your game I would be hard pressed to tell you how to build a function. Are there clear values of something that indicate a win or a loss? Do you have a way of estimating a minimum cost to close the gap?
If you provide more information, I would be happy to try and provide more insight. There are lots of excellent books on the topic as well.
Jacob
请记住,并不一定存在像样的评估函数。对于这个陈述,我假设评估函数必须具有低复杂度(P)。
Take in mind that it's not nescessarily true that a decent evaluation function even exists. For this statement I assume that, an evaluation function has to be of low complexity (P).