当前位置：文江博客话题详情

生命游戏内存优化

发布于 2024-09-15 08:54:38 字数 106 浏览 7 评论 0原文

您好，我编写了一个简单的生命游戏代码，其中使用 2 个数组，一个数组维护当前状态，另一个数组维护下一个状态。任何人都可以建议如何优化该程序的内存消耗。理想情况下，我希望它的空间复杂度低于 RC。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

少女的英雄梦 2024-09-22 08:54:52

我推荐稀疏矩阵，正如 nonnb 和 Amber 所推荐的那样。如果您有处理器能力可以刻录，或者想要序列化到磁盘，您还可以研究编码。 RLE 或基于令牌的压缩可能会让您有所收获。

回复收藏 0 原文

橙味迷妹 2024-09-22 08:54:46

其他回答者在为您的州寻找其他数据结构方面有很好的观点。但无论如何，一个简单的优化可能会使用两个指向状态的指针（您可能已经这样做了）。因此，您不需要执行两个数组副本，而是执行一个副本和三个指针分配：

state curr, next;
[...]
for (;;) {
    next = advance(curr);
    curr = copy(next);
}

对此：

state s1, s2;
state *curr, *next;
[...]
for (;;) {
    *next = advance(*curr);
    /* swap pointers */
    state *tmp = curr;
    curr = next;
    next = tmp;
}

The other answerers have a good point in looking for other data structures for your states. But regardless of that, a simple optimisation might be working with two pointers to states (you might do this already). So instead of doing two array copies you do one copy and three pointer assigments:

state curr, next;
[...]
for (;;) {
    next = advance(curr);
    curr = copy(next);
}

To this:

state s1, s2;
state *curr, *next;
[...]
for (;;) {
    *next = advance(*curr);
    /* swap pointers */
    state *tmp = curr;
    curr = next;
    next = tmp;
}

回复收藏 0 原文

世界等同你 2024-09-22 08:54:44

迟到的答复，但仍然。

查看 golly 的源代码： http://golly.sourceforge.net/

但在执行此操作之前，请先转到至：http://www.ibiblio.org/lifepatterns/lifeapplet.html
看看那里的代码。它是用 Java 编写的，但是速度非常非常快。

Alan Hensel 网站上的这句话应该可以回答您的问题：

你是怎么做到这么快的？
好吧，不经意的观察者可能不会注意到我的小程序速度非常快。您可能没有找到“曲速”按钮，或者您可能没有使用过它，或者您可能对它没有印象。您可以跳到下一部分。
但是，有些人会问，你到底是怎么做到这么快的？！对于好奇的人，或者对于那些想要编写自己的超快速元胞自动机程序的人，我将解释一下。
我倾向于认为元胞自动机优化与数据压缩有关。这也是一个简单的概念，没有简单的解决方案，哪种解决方案最好取决于正在处理的数据类型。在《康威的一生》中，模式往往是斑点状的。
对于斑点宇宙，人们可能应该考虑将宇宙分成大约斑点大小的块。对于 Life 来说，4x4 到 8x8 似乎是合理的。为了方便起见，我选择了上限 8x8：一个字节恰好有 8 位。我强烈考虑过 4x4，但结果不太好......
你应该将这些块放入某种列表中，这样你就可以在宇宙的空白部分浪费零时间。
请注意一个复杂情况：如果模式超出块的边界，则必须在列表中引入新元素，但我们必须知道该块的邻居是否已经存在。您可以对列表进行简单的线性搜索，或二分搜索，或保留某种映射。我选择制作一个哈希表。这仅用于查找新块的邻居；每个现有块都已经保留了一个指向其邻居的指针，因为它们将被经常引用。
块内还必须有一个有效的算法。我选择主要直接穿过每个街区。在处理完块中的所有单元之前，不存在内部循环。此外，还采用了快速查找表。我查找 4x4 块以确定内部 2x2。
注意：CA 程序通常由 2 个主循环（加上一个显示循环）组成，因为 CA 规则对单元并行操作，而微处理器在概念上是串行的。这意味着必须有效地存在宇宙的两个副本，以便在创建下一代的过程中不会破坏任何重要信息。通常这两个副本并不对称。这对我来说是一场巨大的斗争，因为几乎每次我从一个循环中取出一些东西以使其更快时，我都必须在另一个循环中添加其他东西！几乎每次都是；该规则的例外情况会导致最佳优化。特别是，在位操作中需要考虑一些很好的权衡：移位、屏蔽、重新组合以在查找表中形成地址......
也可以认为，有时块的内容可能会稳定下来，不需要进一步处理。您可以将该块从列表中取出，将其置于“休眠”状态，只有在相邻块有一些活动溢出时才重新激活。这些块将花费零处理时间，就像宇宙的空白区域一样。
周期 2 振荡器也可能不太难以检测并从处理时间中删除。这在生活中可能是值得的，因为信号灯是最常见的随机碎片。较高周期的振荡器更为罕见。滑翔机也有可能被检测和模拟。除非您将其发挥到极致（参见 HashLife），否则您将从这种优化中获得的回报递减。
此外，完全空的单元格块可能暂时不值得重新分配并从哈希表中删除。这需要一些处理时间，对于振荡器反复进出其空间的情况来说，这可能会很重要。仅当内存不足时，才应回收“太平间”中最旧的块。
当程序足够快时，应该考虑到不值得以比眼睛可以看到的速度更快的速度显示世代，或者至少不比显示器的刷新率快很多。特别是在窗口环境中，显示时间可能是一个真正的瓶颈。

然后阅读这段源代码： http://www.ibiblio.org/lifepatterns/ src.41d/LifeCell.java
它包含保存艾伦生命宇宙的数据结构。

忘记 hashlife，它太复杂了，会让你头晕的。

Late answer, but still.

Check out the sourcecode for golly: http://golly.sourceforge.net/

But before you do that, go to: http://www.ibiblio.org/lifepatterns/lifeapplet.html
And have a look at the code there. It's in Java, but it's very very fast.

This quote from Alan Hensel's website should answer your question:

How did you make it go so fast?
Okay, the casual observer might not notice that my applet is blazingly fast. You might not have found the "Warp Speed" button, or you might not have used it, or you might not have been impressed with it. You can skip to the next section.
However, some of you have asked, how the heck did you make it go so fast?! To the curious, or to those thinking of writing their own super-fast cellular automata program, I will explain.
I tend to think of cellular automata optimization as being related to data compression. This is also a simple concept with no simple solution, and what solutions are best depends on the type of data being processed. In Conway's Life, patterns tend to be blobby.
For blobby universes, one should probably consider dividing the universe up into blocks approximately the size of the blobs. For Life, 4x4 to 8x8 seem reasonable. I chose the upper bound, 8x8, for reasons of convenience: There happen to be 8 bits in a byte. I strongly considered 4x4, but it didn't work out as nice....
You should put the blocks in some kind of list, so that you waste zero time in the empty parts of the universe.
Already, note a complication: New elements in the list must be introduced if the pattern grows over a block's boundaries, but we have to know if the block's neighbor already exists. You can either do a simple linear search of the list, or binary search, or keep some kind of map. I chose to make a hash table. This is solely used for finding the neighbors of a new block; each existing block already keeps a pointer to its neighbors, as they will be referenced often.
There must also be an efficient algorithm within the blocks. I chose to primarily blaze straight thru each block. There are no inner loops until all cells in a block are processed. Also, fast-lookup tables are employed. I look up 4x4 blocks to determine the inner 2x2.
Note: CA programs typically consist of 2 main loops (plus a display loop), because CA rules operate on the cells in parallel, while the microprocessor is conceptually serial. This means that there must be two copies of the universe, effectively, so that no important info is destroyed in the process of creating the next generation. Often these 2 copies are not symmetrical. It was a great struggle for me, since almost every time I took something out of one loop to make it faster, I had to add something else to the other loop! Almost every time, that is; the exceptions to that rule lead to the best optimizations. In particular, there are good tradeoffs to be considered in bit-manipulations: shifting, masking, recombining to form an address in the lookup table....
It can also be considered that sometimes the contents of a block may stabilize, requiring no further processing. You could take the block out of the list, putting it in a "hibernation" state, only to be re-activated if a neighboring block has some activity spilling into it. These blocks would take zero processing time, just like a blank region of the universe.
Period-2 oscillators might also not be very difficult to detect, and remove from the processing time. This might be worthwhile in Life, because the blinker is the most common kind of random debris. Higher period oscillators are much more rare. It is also possible that gliders could be detected and simulated. You will get diminishing returns from this kind of optimization, unless you take it to an extreme (cf. HashLife).
Also, a block of cells that's completely empty might not be worth deallocating and removing from the hash table for a while. That takes some processing time, which could be significant in the case of an oscillator moving in and out of its space repeatedly. Only when memory gets low should the oldest blocks from the "morgue" be recycled.
When the program is fast enough, it should be considered that it isn't worth displaying generations any faster than the eye can see, or at least not much faster than the refresh rate of the monitor. Especially in windowed environments, display time can be a real bottleneck.

Then read this piece of sourcecode: http://www.ibiblio.org/lifepatterns/src.41d/LifeCell.java
It contains the datastructures that hold Alan's Life universe.

Forget about hashlife, it is so complex it will make your head spin.

回复收藏 0 原文