确定在进化算法中权衡哪些输入
我曾经写过一个俄罗斯方块人工智能,玩得很好。我使用的算法(本文中描述)是一个两步过程。
第一步,程序员决定跟踪对问题“感兴趣”的输入。在俄罗斯方块中,我们可能有兴趣跟踪连续有多少间隙,因为最小化间隙可以帮助更轻松地放置未来的碎片。另一个可能是平均柱高,因为如果你即将失败,冒险可能不是一个好主意。
第二步是确定与每个输入相关的权重。这是我使用遗传算法的部分。任何学习算法都可以在这里进行,只要根据结果随时间调整权重即可。这个想法是让计算机决定输入与解决方案的关系。
使用这些输入及其权重,我们可以确定采取任何行动的价值。例如,如果将直线形状一直放在右列中可以消除4个不同行的间隙,那么如果该动作的权重较高,则可以获得很高的分数。同样,将其平放在顶部实际上可能会导致间隙,从而导致动作得分较低。
我一直想知道是否有一种方法可以将学习算法应用到第一步,在那里我们找到“有趣的”潜在输入。似乎可以编写一种算法,让计算机首先学习哪些输入可能有用,然后应用学习来权衡这些输入。以前有过类似的事情吗?它是否已被用于任何人工智能应用程序?
I once wrote a Tetris AI that played Tetris quite well. The algorithm I used (described in this paper) is a two-step process.
In the first step, the programmer decides to track inputs that are "interesting" to the problem. In Tetris we might be interested in tracking how many gaps there are in a row because minimizing gaps could help place future pieces more easily. Another might be the average column height because it may be a bad idea to take risks if you're about to lose.
The second step is determining weights associated with each input. This is the part where I used a genetic algorithm. Any learning algorithm will do here, as long as the weights are adjusted over time based on the results. The idea is to let the computer decide how the input relates to the solution.
Using these inputs and their weights we can determine the value of taking any action. For example, if putting the straight line shape all the way in the right column will eliminate the gaps of 4 different rows, then this action could get a very high score if its weight is high. Likewise, laying it flat on top might actually cause gaps and so that action gets a low score.
I've always wondered if there's a way to apply a learning algorithm to the first step, where we find "interesting" potential inputs. It seems possible to write an algorithm where the computer first learns what inputs might be useful, then applies learning to weigh those inputs. Has anything been done like this before? Is it already being used in any AI applications?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在神经网络中,您可以通过查找与您正在训练的分类具有最强相关性(正相关或负相关)的输入来选择“有趣”的潜在输入。我想你也可以在其他情况下做类似的事情。
In neural networks, you can select 'interesting' potential inputs by finding the ones that have the strongest correlation, positive or negative, with the classifications you're training for. I imagine you can do similarly in other contexts.
我想我可以通过向学习算法提供更原始的数据来解决您所描述的问题。例如,俄罗斯方块游戏状态可以通过占用单元的列表来描述。描述该信息的一串位将是学习算法该阶段的合适输入。实际上,这方面的培训仍然具有挑战性;你怎么知道这些结果是否有用。我想你可以将整个算法整合到一个单一的 blob 中,其中算法被输入连续的游戏状态,输出只是块的位置,并为后代选择更高的得分算法。
另一种选择可能是使用来自其他来源的大量戏剧语料库;例如人类玩家或手工制作的人工智能的记录游戏,并选择其输出与未来游戏中的一些有趣事实或其他事实(例如在接下来的 10 步中获得的分数)具有很强相关性的算法。
I think I might approach the problem you're describing by feeding more primitive data to a learning algorithm. For instance, a tetris game state may be described by the list of occupied cells. A string of bits describing this information would be a suitable input to that stage of the learning algorithm. actually training on that is still challenging; how do you know whether those are useful results. I suppose you could roll the whole algorithm into a single blob, where the algorithm is fed with the successive states of play and the output would just be the block placements, with higher scoring algorithms selected for future generations.
Another choice might be to use a large corpus of plays from other sources; such as recorded plays from human players or a hand-crafted ai, and select the algorithms who's outputs bear a strong correlation to some interesting fact or another from the future play, such as the score earned over the next 10 moves.
是的,有办法。
如果您选择M个选定的特征,则有 2^M 个子集,因此需要查看很多内容。
我会执行以下操作:
然后,对于每对 SW,您可以为每对运行 G 游戏并保存每对的分数 L。现在你有一个像这样的表:
现在你可以运行一些组件选择算法(例如PCA)并决定哪些特征值得解释scoreL。
提示:当运行代码来优化 W 时,为随机数生成器提供种子,以便针对相同的片段序列测试每个不同的“进化大脑”。
我希望它能有所帮助!
Yes, there is a way.
If you choose M selected features there are 2^M subsets, so there is a lot to look at.
I would to the following:
Then for each pair S-W, you can run G games for each pair and save the score L for each one. Now you have a table like this:
Now you can run some component selection algorithm (PCA for example) and decide which features are worth to explain scoreL.
A tip: When running the code to optimize W, seed the random number generator, so that each different 'evolving brain' is tested against the same piece sequence.
I hope it helps in something!