当前位置：文江博客话题详情

不完全信息博弈的策略

发布于 2024-11-19 03:14:55 字数 120 浏览 9 评论 0原文

对于信息不完整的游戏，尤其是像 Bridge 和 Doppelkopf 这样的拿技巧游戏，是否有通用策略？

我对为此类游戏实施人工智能策略的方法感兴趣。

赏金是针对对特定策略进行最佳描述的答案。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁许谁一生繁华 2024-11-26 03:14:55

我想为卡牌游戏 Doppelkopf 贡献一些具体信息，这是作者示例性询问的。 2012 年，Sievers 撰写了硕士论文，他采用了Doppelkopf 游戏的 UCT 算法。

UCT通常假设完美信息博弈，因此
他首先解决了“卡牌分配”问题，即根据一些已知的卡牌来猜测每个玩家的卡牌分配。

解决这个问题后，他尝试了两种方法来执行算法来解决卡片分配问题：

1）猜测每个 UCT 树的卡片分配并查看多个树的平均值。他将这种策略集合称为 UCT。

2) 采用单个 uct 树并为每次推出猜测一个新分配。在 UCT 的选择阶段，您只需忽略所有不一致的子项即可。他将此策略称为单一 UCT。

我的感觉是，2）使得人工智能更强，但似乎更弱，他在 2015 年的后续会议论文。

受到 AlphaGo 成功的启发，我和一位朋友为他的学士论文启动了一个项目，他做了一个策略神经网络使用基于字符的LSTM来指导UCT算法的选择过程。他的学士论文只涵盖了 ensemble-UCT 的一些测试结果，但我也已经针对单个 UCT 玩家进行了测试，它使得人工智能变得更强。我想这是因为单个 UCT 玩家可以从更有效地减少搜索空间中获益更多。

所以这个答案或多或少与 @charley 给出的相同，但更具体一些。

回复收藏 0 原文

路弥 2024-11-26 03:14:55

“信息不完整”的问题往往可以通过“专家系统”或过滤机制得到很好的解决。请记住，“博弈论”仅涉及“优化结果”（以牺牲其他一切为代价来优化某些内容），因此即使像专家系统这样的方法也可以编入实际的用户交互游戏中。

“不完整信息”的“专家系统”示例如下：我需要一辆新车。宇宙始于所有“已知”汽车，或者可能是动态（可能是随机）生成的一组“可能” “汽车（例如，不同的功能/型号、不同的制造商、不同的功能等）然后，系统可以问我问题，例如：

问题：什么是最重要的？

运输能力
汽油里程
价格
我不知道

重要的是“我不知道”——它必须是每个问题的一个选项，因为答案将导致“过滤”操作（例如，从可用汽车中删除可能的汽车）或“排名”操作（例如，将某些汽车排序为“首选”）。

由于它专门适用于游戏引擎，因此您将建立一个“可能性的宇宙”，例如您可以走的走廊、棋盘上的瓷砖、每种可能的方向、每种可能使用的“武器”、每种可能的“敌人” 根据规则/上下文/用户输入，删除

然后，根据游戏动态，您的工作只是：

不可行的选项。
基于规则/上下文/用户输入，排序/排名最优选的选项。
立即在游戏中选择/使用“列表顶部”的项目。

这种类型的人工智能如此有效的原因与“模糊数学”领域（本身就是一个很好的谷歌搜索）有关，在这个领域中，你可以智能地应用你所拥有的不完整信息，而无需考虑你的信息（或用它来污染你的系统）。没有，而且不“信任”任何信息的原子单元（因为过滤和排序往往会随着时间的推移“平均”错误）。

如果你在过滤和排序上设置一个“时间系数”，（旧问题的答案越来越被认为是“可疑的”，而被“过滤掉”或“排序到底部”的旧项目也越来越多）重新开始游戏的概率），那么您就可以获得一个真正有趣的、动态的、无限运行的游戏。

而且，“动态”和“无限运行”游戏是在您添加一些游戏所具有的随机成分之前。像 Minefield、Battleship 和 Stratego 这样的游戏在游戏运行过程中大多不会改变，因此你的“本地化和时间衰减答案”可能足以满足（非常）长时间运行的游戏。然而，如果你随机生成新的敌人，或者你的敌人四处移动，或者“棋盘设置”中有一些其他随机组件（例如潮汐，其中某些路径有时仅可用），那么这会增加全新的复杂性。

海洋潮汐遮蔽路径可能遵循伪规则或伪随机时间表。然而，“专家系统”或“过滤”概念表明您有一组（可能是无限的）“事件”（其中“潮汐”是一个“事件”），并且您同样应用过滤和排序选择“立即”事件（在使用过滤和排序来决定“事件”应该发生之后，而不是所有其他非事件选项）。

Problems where you have "incomplete information" tend to be addressed well with "Expert Systems" or filtering mechanisms. Remember, "game theory" merely relates to "optimizing a result" (optimizing something at the expense of everything else), so even approaches like expert systems can be codified into actual user-interactive-games.

An "expert system" example of "incomplete information" would be: I need a new car. The universe starts with all "known" cars, or possibly a dynamically (perhaps randomly) generated set of "possible" cars (e.g., different features/models, different manufacturers, different capabilities, etc.) Then, the system can ask me questions, like:

QUESTION: What is most important?

hauling capacity
gas mileage
price
I don't know

The important thing is the, "I don't know" -- it must be an option for every question, because the answers will result in "filtering" operations (e.g., remove possible cars from the available cars) or "ranking" operations (e.g., sorting some as "preferred" over others).

As it applies specifically to a game engine, you would establish a "universe of possibilities", like hallways you could walk down, tiles on a board, every possible orientation direction, every possible "weapon" that could be used, every possible "enemy individual" or "enemy groups" etc.

Then, based on the game dynamics, your job is ONLY to:

Based on rules/context/user-input, REMOVE non-viable options.
Based on rules/context/user-input, SORT/RANK most preferred options.
The item at the "top of the list" is selected/employed RIGHT NOW in the game.

The reason this type of AI works so well relates to the "fuzzy math" domain (a good Google search on its own), where you can intelligently apply the incomplete information that you have, without considering (or tainting your system with) information you do not have, plus not "trusting" any atomic unit of information all-that-much (because the filtering-and-sorting tends to "average out" errors over time).

If you put a "time coefficient" on your filtering-and-sorting, (answers to old questions are increasingly considered "suspect", and old items that were "filtered out" or "sorted-to-the-bottom" are with increasing probability back-into-play), then you can get a really interesting, and dynamic, and infinitely running game.

And, that "dyanmic" and "infinitely running" game is before you add the stochastic component, which some games have. Games like Minefield and Battleship and Stratego mostly don't change during the running of the game, so your "localized-and-time-decay answers" may be sufficient for a (very) long running game. However, if you randomly generate new enemies, or your enemies move around, or there is some other random component to the "board setup" (like ocean tides where some paths are only sometimes available), then that adds a whole new level of complexity.

Ocean tides obscuring paths may be on a pseudo-regular or pseudo-randomized schedule. However, the "expert system" or "filtering" concept would suggest that you have a (possibly infinite) set of "events" (where the "ocean tide" is an "event"), and you similarly apply filtering-and-ordering to select the RIGHT NOW event (after you used filtering-and-ordering to decide that an "event" should occur at all, as opposed to all the other non-event options).

回复收藏 0 原文