为“邻近”游戏找到最佳/足够好的策略和 AI?
“Proximity”是一款类似于黑白棋、围棋和风险的领土统治策略游戏。 两名玩家,使用 10x12 六角网格。由 Brian Cable 于 2007 年发明的游戏。
似乎是一个值得讨论 a) 最佳算法 b) 如何构建 AI 的游戏。
由于随机因素和疯狂的分支因素 (20^120),策略将是概率性或启发式的。 所以很难客观地进行比较。 每回合最多 5 秒的计算时间限制似乎是合理的 =>这排除了所有暴力尝试。(在专家级别上玩游戏的人工智能来感受一下 - 基于一些简单的启发式,它做得非常好)
游戏: 此处为 Flash 版本,iPhone 版本 iProximity 此处 以及网络上其他地方的许多副本 规则:此处
对象:控制大多数军队在所有瓷砖都放置完毕后。你从一个空的六角板开始。每回合您都会收到一个随机编号的图块(数值在 1 到 20 支军队之间),可以放置在任何空闲的棋盘空间上。如果该图块与任何盟友图块相邻,它将增强这些图块中每个图块的防御力 +1(最多可达 20)。如果它与任何敌人图块相邻,如果它的数量高于敌人图块上的数量,它将控制它们。
关于策略的想法:以下是一些初步想法;将计算机人工智能设置为“专家”可能会教很多东西:
- 最小化你的周长似乎是一个很好的策略,以防止翻转并最小化最坏情况的损害,
- 就像围棋一样,在你的阵型中留下洞是致命的,对于六角网格来说更是如此因为你可能会在一次移动中失去最多 6 个方格的军队,所以
- 低数量的棋子是一种负担,所以将它们放置在远离你的主要领土、靠近棋盘边缘并分散的地方。您还可以使用数量较少的方块来堵住阵型中的漏洞,或者沿着周边取得较小的收益,而对手不会倾向于攻击。
- 由三块组成的三角形很坚固,因为它们相互增强,并且还减少了周长。
- 每个瓷砖最多可以翻转 6 次,即当其相邻瓷砖被占据时。对编队的控制可以来回流动。有时,您会丢失阵型的一部分并堵住任何漏洞,以使棋盘的该部分“死亡”并锁定您的领土/防止进一步的损失。
- 低编号的瓷砖是明显但价值低的负债,但高编号的瓷砖如果被翻转(这更难)可能会成为更大的负债。一次幸运地使用 20 支军队的棋子可以导致 200 的波动(从 +100 到 -100 军队)。所以瓷砖的摆放会有进攻性和防守性的考虑。
评论 1,2,4 似乎类似于极小极大策略,其中我们最小化最大预期可能损失(通过对手可以从 1..20 获得的值 ß 的一些概率考虑进行修改,即只能由 ß 翻转的结构=20 块瓷砖“几乎坚不可摧”。) 我不清楚评论 3、5、6 对最佳策略的影响。 对围棋、国际象棋或黑白棋棋手的评论感兴趣。
(续作 ProximityHD for Xbox Live 允许 4 人合作或竞争的本地多人游戏 增加了分支因素,因为你现在在任何给定时间手中都有 5 个棋子,其中你只能使用其中一个盟友棋子的强化增加到每个盟友 +2。)
'Proximity' is a strategy game of territorial domination similar to Othello, Go and Risk.
Two players, uses a 10x12 hex grid. Game invented by Brian Cable in 2007.
Seems to be a worthy game for discussing a) optimal algorithm then b) how to build an AI.
Strategies are going to be probabilistic or heuristic-based, due to the randomness factor, and the insane branching factor (20^120).
So it will be kind of hard to compare objectively.
A compute time limit of 5 seconds max per turn seems reasonable => this rules out all brute-force attempts. (Play the game's AI on Expert level to get a feel - it does a very good job based on some simple heuristic)
Game: Flash version here, iPhone version iProximity here and many copies elsewhere on the web
Rules: here
Object: to have control of the most armies after all tiles have been placed. You start with an empty hexboard. Each turn you receive a randomly numbered tile (value between 1 and 20 armies) to place on any vacant board space. If this tile is adjacent to any ALLY tiles, it will strengthen each of those tile's defenses +1 (up to a max value of 20). If it is adjacent to any ENEMY tiles, it will take control over them IF its number is higher than the number on the enemy tile.
Thoughts on strategy: Here are some initial thoughts; setting the computer AI to Expert will probably teach a lot:
- minimizing your perimeter seems to be a good strategy, to prevent flips and minimize worst-case damage
- like in Go, leaving holes inside your formation is lethal, only more so with the hex grid because you can lose armies on up to 6 squares in one move
- low-numbered tiles are a liability, so place them away from your main territory, near the board edges and scattered. You can also use low-numbered tiles to plug holes in your formation, or make small gains along the perimeter which the opponent will not tend to bother attacking.
- a triangle formation of three pieces is strong since they mutually reinforce, and also reduce the perimeter
- Each tile can be flipped at most 6 times, i.e. when its neighbor tiles are occupied. Control of a formation can flow back and forth. Sometimes you lose part of a formation and plug any holes to render that part of the board 'dead' and lock in your territory/ prevent further losses.
- Low-numbered tiles are obvious-but-low-valued liabilities, but high-numbered tiles can be bigger liabilities if they get flipped (which is harder). One lucky play with a 20-army tile can cause a swing of 200 (from +100 to -100 armies). So tile placement will have both offensive and defensive considerations.
Comment 1,2,4 seem to resemble a minimax strategy where we minimize the maximum expected possible loss (modified by some probabilistic consideration of the value ß the opponent can get from 1..20 i.e. a structure which can only be flipped by a ß=20 tile is 'nearly impregnable'.)
I'm not clear what the implications of comments 3,5,6 are for optimal strategy.
Interested in comments from Go, Chess or Othello players.
(The sequel ProximityHD for XBox Live, allows 4-player -cooperative or -competitive local multiplayer increases the branching factor since you now have 5 tiles in your hand at any given time, of which you can only play one. Reinforcement of ally tiles is increased to +2 per ally.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里是 U of A GAMES 小组的前成员。
这个分支因素太疯狂了。比Go差远了。
基本上,你被套住了。
该游戏的问题在于,由于选择了随机图块,因此它不是确定性的。这实际上在树中每个现有节点层之间添加了另一层节点。您会对我的关于 *-Minimax 的出版物感兴趣了解在随机域中搜索的技术。
为了在本世纪末之前完成单层搜索,您将需要一些非常积极的前向修剪技术。尽早将可证明的最佳走法扔出窗外,集中精力建立良好的走法顺序。
A former member of the U of A GAMES group here.
That branching factor is insane. Far worse than Go.
Basically, you're hooped.
The problem with this game is that it is not deterministic due to the selection of a random tile. This actually adds another layer of nodes between each existing layer of nodes in the tree. You'll be interested in my publications on *-Minimax to learn about techniques for searching in stochastic domains.
In order to complete one-ply searches before the end of this century, you're going to need some very aggressive forward pruning techniques. Throw provably best move out the window early and concentrate on building good move ordering.
对于通用算法,我建议您查看阿尔伯塔大学人工智能游戏小组所做的研究:http://games .cs.ualberta.ca 那里的许多算法保证找到最优策略。然而,我怀疑你真的有兴趣找到最佳的,以“足够好”为目标,除非你想在韩国销售该游戏:D
从你的描述中,我了解到该游戏是一个两人游戏,具有完整的-可观察性,即没有隐藏单元和完全确定性,即玩家的行为结果不需要滚动,那么您应该看看阿尔伯塔大学提出的实时有界搜索极小极大导数。然而,能够对值函数的备份深度进行绑定可能是为游戏添加“难度级别”的好方法。他们一直在做一些工作——在我看来有点可疑——对搜索空间进行采样以改进价值函数估计。
关于您描述的“策略”部分:在我提到的框架中,您必须将该知识编码为评估函数。看看 Michael Büro 和其他人(也在阿尔伯塔大学小组)的工作,了解此类知识工程的例子。
另一种可能性是将问题视为强化学习问题,其中对手的动作被编译为“后状态”。在 Barto & 上查找一下。萨顿书:http://webdocs.cs.ualberta.ca/ ~sutton/book/the-book.html 然而,这样的编译所产生的 RL 问题的价值函数可能会有点难以最佳地解决 - 状态的数量会像氢弹一样爆炸。但是,如果您了解如何使用分解表示,事情就会变得容易得多。你的“策略”也许可以被编码为某种塑造函数,这将大大加快学习过程。
编辑:该死的英语介词
For general algorithms, I would suggest you to check the research done by the Alberta University AI Games group: http://games.cs.ualberta.ca Many of the algorithms there guarantee to find optimal policies. However, I doubt you're really interested in finding the optimal, aim for the "good enough" unless you want to sell that game in Korea :D
From your description, I have understood the game to be a two-player with full-observability i.e. no hidden units and such and fully deterministic i.e. player's actions outcomes do not require rolling, then you should take a look at the real-time bounded-search minimax derivatives proposed by the U Alberta guys. However, being able to do bound as well the depth of the backups of the value function would perhaps be a nice way to add a "difficulty level" to your game. They have been doing some work - a bit fishy imo - on sampling the search space for improving value function estimates.
About the "strategy" section you describe: in the framework I am mentioning, you will have to encode that knowledge as an evaluation function. Look at the work of Michael Büro and others - also in the U Alberta group - for examples of such knowledge engineering.
Another possibility would be to pose the problem as a Reinforcement Learning problem, where adversary moves are compiled as "afterstates". Look that up on the Barto & Sutton book: http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html However the value function for a RL problem resulting from such a compilation might prove a bit difficult to solve optimally - the number of states will blow up like an H-Bomb. However, if you see how to use a factored representation, things can be much easier. And your "strategy" could perhaps be encoded as some shaping function, which would be speeding up the learning process considerably.
EDIT: Damn English prepositions