不完全信息博弈的策略

发布于 2024-11-19 03:14:55 字数 120 浏览 5 评论 0原文

对于信息不完整的游戏,尤其是像 Bridge 和 Doppelkopf 这样的拿技巧游戏,是否有通用策略?

我对为此类游戏实施人工智能策略的方法感兴趣。

赏金是针对对特定策略进行最佳描述的答案。

Are there general strategies for games with incomplete information, especially trick-taking games like Bridge and Doppelkopf?

I am interested in ways to implement AI strategies for such games.

The bounty is for the answer with the best description of a particular strategy.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

谁许谁一生繁华 2024-11-26 03:14:55

我想为卡牌游戏 Doppelkopf 贡献一些具体信息,这是作者示例性询问的。 2012 年,Sievers 撰写了硕士论文,他采用了Doppelkopf 游戏的 UCT 算法。

UCT通常假设完美信息博弈,因此
他首先解决了“卡牌分配”问题,即根据一些已知的卡牌来猜测每个玩家的卡牌分配。

解决这个问题后,他尝试了两种方法来执行算法来解决卡片分配问题:

1)猜测每个 UCT 树的卡片分配并查看多个树的平均值。他将这种策略集合称为 UCT。

2) 采用单个 uct 树并为每次推出猜测一个新分配。在 UCT 的选择阶段,您只需忽略所有不一致的子项即可。他将此策略称为单一 UCT。

我的感觉是,2)使得人工智能更强,但似乎更弱,他在 2015 年的后续会议论文

受到 AlphaGo 成功的启发,我和一位朋友为他的学士论文启动了一个项目,他做了一个策略神经网络使用基于字符的LSTM来指导UCT算法的选择过程。他的学士论文只涵盖了 ensemble-UCT 的一些测试结果,但我也已经针对单个 UCT 玩家进行了测试,它使得人工智能变得更强。我想这是因为单个 UCT 玩家可以从更有效地减少搜索空间中获益更多。

所以这个答案或多或少与 @charley 给出的相同,但更具体一些。

I would like to contribute with some concrete information for the card game Doppelkopf, which the author asked exemplarily. In 2012 a master thesis by Sievers was written where he adopted the UCT algorithm for the Doppelkopf game.

UCT normally assumes a perfect information game, thus
he first solves the "card assignment" problem, which is to guess a card assignment for each player based on some already known cards.

After solving this, he tried two ways to perform the algorithm with his solution to the card assignment problem:

1) Guess a card assignment for each UCT-tree and look at the average of multiple trees. He calls this strategy ensemble UCT.

2) Take a single uct tree and guess for each rollout a new assignment. In the selection phase of UCT you simply ignore all inconsistent children. He calls this strategy single UCT.

My feeling is that 2) makes a stronger AI but it seemed to be weaker, which he pointed out more clearly in a follow-up conference paper from 2015.

Inspired by AlphaGo's success, I started a project with a friend for his bachelor thesis and he made a policy neural network using a character based LSTM to guide the selection process of the UCT algorithm. His bachelor thesis only covers some test results for the ensemble-UCT but I already tested it for the single UCT player too and it makes a much stronger AI. I guess this is because the single UCT player benefits alot more from reducing the search space more effectively.

So this answer is more or less the same as @charley have been given but a little more concrete.

路弥 2024-11-26 03:14:55

“信息不完整”的问题往往可以通过“专家系统”或过滤机制得到很好的解决。请记住,“博弈论”仅涉及“优化结果”(以牺牲其他一切为代价来优化某些内容),因此即使像专家系统这样的方法也可以编入实际的用户交互游戏中。

“不完整信息”的“专家系统”示例如下:我需要一辆新车。宇宙始于所有“已知”汽车,或者可能是动态(可能是随机)生成的一组“可能” “汽车(例如,不同的功能/型号、不同的制造商、不同的功能等) 然后,系统可以问我问题,例如:

问题:什么是最重要的?

  1. 运输能力
  2. 汽油里程
  3. 价格
  4. 我不知道

重要的是“我不知道”——它必须是每个问题的一个选项,因为答案将导致“过滤”操作(例如,从可用汽车中删除可能的汽车)或“排名”操作(例如,将某些汽车排序为“首选”)。

由于它专门适用于游戏引擎,因此您将建立一个“可能性的宇宙”,例如您可以走的走廊、棋盘上的瓷砖、每种可能的方向、每种可能使用的“武器”、每种可能的“敌人” 根据规则/上下文/用户输入,删除

然后,根据游戏动态,您的工作只是:

  1. 不可行的选项。
  2. 基于规则/上下文/用户输入,排序/排名最优选的选项。
  3. 立即在游戏中选择/使用“列表顶部”的项目。

这种类型的人工智能如此有效的原因与“模糊数学”领域(本身就是一个很好的谷歌搜索)有关,在这个领域中,你可以智能地应用你所拥有的不完整信息,而无需考虑你的信息(或用它来污染你的系统)。没有,而且不“信任”任何信息的原子单元(因为过滤和排序往往会随着时间的推移“平均”错误)。

如果你在过滤和排序上设置一个“时间系数”,(旧问题的答案越来越被认为是“可疑的”,而被“过滤掉”或“排序到底部”的旧项目也越来越多)重新开始游戏的概率),那么您就可以获得一个真正有趣的、动态的、无限运行的游戏。

而且,“动态”和“无限运行”游戏是在您添加一些游戏所具有的随机成分之前。像 Minefield、Battleship 和 Stratego 这样的游戏在游戏运行过程中大多不会改变,因此你的“本地化和时间衰减答案”可能足以满足(非常)长时间运行的游戏。然而,如果你随机生成新的敌人,或者你的敌人四处移动,或者“棋盘设置”中有一些其他随机组件(例如潮汐,其中某些路径有时仅可用),那么这会增加全新的复杂性。

海洋潮汐遮蔽路径可能遵循伪规则或伪随机时间表。然而,“专家系统”或“过滤”概念表明您有一组(可能是无限的)“事件”(其中“潮汐”是一个“事件”),并且您同样应用过滤和排序选择“立即”事件(在使用过滤和排序来决定“事件”应该发生之后,而不是所有其他非事件选项)。

Problems where you have "incomplete information" tend to be addressed well with "Expert Systems" or filtering mechanisms. Remember, "game theory" merely relates to "optimizing a result" (optimizing something at the expense of everything else), so even approaches like expert systems can be codified into actual user-interactive-games.

An "expert system" example of "incomplete information" would be: I need a new car. The universe starts with all "known" cars, or possibly a dynamically (perhaps randomly) generated set of "possible" cars (e.g., different features/models, different manufacturers, different capabilities, etc.) Then, the system can ask me questions, like:

QUESTION: What is most important?

  1. hauling capacity
  2. gas mileage
  3. price
  4. I don't know

The important thing is the, "I don't know" -- it must be an option for every question, because the answers will result in "filtering" operations (e.g., remove possible cars from the available cars) or "ranking" operations (e.g., sorting some as "preferred" over others).

As it applies specifically to a game engine, you would establish a "universe of possibilities", like hallways you could walk down, tiles on a board, every possible orientation direction, every possible "weapon" that could be used, every possible "enemy individual" or "enemy groups" etc.

Then, based on the game dynamics, your job is ONLY to:

  1. Based on rules/context/user-input, REMOVE non-viable options.
  2. Based on rules/context/user-input, SORT/RANK most preferred options.
  3. The item at the "top of the list" is selected/employed RIGHT NOW in the game.

The reason this type of AI works so well relates to the "fuzzy math" domain (a good Google search on its own), where you can intelligently apply the incomplete information that you have, without considering (or tainting your system with) information you do not have, plus not "trusting" any atomic unit of information all-that-much (because the filtering-and-sorting tends to "average out" errors over time).

If you put a "time coefficient" on your filtering-and-sorting, (answers to old questions are increasingly considered "suspect", and old items that were "filtered out" or "sorted-to-the-bottom" are with increasing probability back-into-play), then you can get a really interesting, and dynamic, and infinitely running game.

And, that "dyanmic" and "infinitely running" game is before you add the stochastic component, which some games have. Games like Minefield and Battleship and Stratego mostly don't change during the running of the game, so your "localized-and-time-decay answers" may be sufficient for a (very) long running game. However, if you randomly generate new enemies, or your enemies move around, or there is some other random component to the "board setup" (like ocean tides where some paths are only sometimes available), then that adds a whole new level of complexity.

Ocean tides obscuring paths may be on a pseudo-regular or pseudo-randomized schedule. However, the "expert system" or "filtering" concept would suggest that you have a (possibly infinite) set of "events" (where the "ocean tide" is an "event"), and you similarly apply filtering-and-ordering to select the RIGHT NOW event (after you used filtering-and-ordering to decide that an "event" should occur at all, as opposed to all the other non-event options).

国际总奸 2024-11-26 03:14:55

我认为 Expectimax 通常用于解决此类问题。该策略是最小化对手得分的最坏情况期望值。

I think Expectimax is generally used for these types of problems. The strategy is to minimize the worst-case expected value of the opponent's score.

百合的盛世恋 2024-11-26 03:14:55

您可以尝试实施一些强化学习模式。它需要大量数学知识,但使用起来很好。

编辑:这里一个关于 的精彩资源的链接RL。

您可以使用强化学习来过滤对 AI 重要的内容以及应忽略的内容。你的人工智能会从他的错误中吸取教训,但它会及时学习,并知道什么对这场游戏重要,什么不重要。

Edit2:简而言之,强化学习使你的智能体能够从他的经验中学习,就像我们人类学习一样。我们燃烧一次——我们避免接触火。我们做某事后会得到奖励——我们继续做是为了获得更多奖励。使用 RL 的智能体也是如此。

You could try implementing some Reinforcement Learning schema. It needs a lot of Math but it is nice to use.

Edit: Here's a link for a great resource on RL.

You can use RL to filter what's important for your AI and what should be ignored. Your AI will learn from his mistakes but it will learn in time and will know what's important for that game and what's not.

Edit2: In a nutshell, RL enables your agent to learn from his experiences, just like we humans learn. We burn once - we avoid touching fire. We get a reward after doing something - we keep on doing it for more reward. The same is for agents using RL.

总以为 2024-11-26 03:14:55

我不知道任何通用策略,但我已经实现了一个带有隐藏动作和不错的人工智能的游戏。

人工智能试图通过查看所有可能的攻击目标并找出这些动作似乎集中在哪个目标来弄清楚人类在做什么。如果可能的话,人工智能会尝试增援,如果无法守住,就会尝试撤离。

虽然有时可以通过同时攻击两个目标来诱骗它,但这通常意味着攻击较弱,除非您已经获胜,否则它不会很好地发挥作用。

这已经够难的了,我给的大多数人都无法以同等的价格击败它。

I'm not aware of any general strategies but I have implemented a game with hidden moves and a decent AI.

The AI attempted to figure out what the human was up to by looking at all possible attack targets and figuring out what target the moves seemed to be concentrating on. The AI would try to reinforce if possible or evacuate if it couldn't expect to hold.

While it could sometimes be decoyed by attacking two targets at once this usually meant weak attacks and it didn't work very well unless you were already winning.

It was tough enough most people I gave copies to couldn't beat it at parity.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文