如何有效存储大量排列?
假设我们有一个元素列表:
[{dog,1},{dog,2},{cat,1},{cat,2},{bird,1},{bird,2},...]
我想将此列表的所有可能的 排列 存储在内存。
由于列表可能相当长(10 个元素或更多),因此需要大量空间来存储它(阶乘 N)。
例如,如果我有一个列表,它消耗大约 70 字节的空间并有 12 个元素,那么我需要 12! * 70 ~ 31 GB。如果我在列表中再添加一个元素,那么将排列存储在 RAM 中可能变得不可行。
有没有比下面的 Erlang 表示更有效的表示来将所有排列保留在内存中?
[{dog,1},{dog,2},{cat,1},{cat,2},{bird,1},{bird,2},...]
(我知道原子 dog
在原子表中仅存储一次,但由于它在每个排列中都会重复,因此需要 N 内存)。
也许这些排列可以以某种字节表示形式存储? (抱歉,我是字节和二进制方面的新手)。
毕竟,它只是相同的元素,但以不同的方式重新排列。
Let's say we have a list of elements:
[{dog,1},{dog,2},{cat,1},{cat,2},{bird,1},{bird,2},...]
I would like to store all possible permutations of this list in the RAM.
Since the list can be pretty long (10 elements and more), it takes a lot of space to store it (factorial N).
For instance, if I have a list, which consumes about 70 bytes of space and has 12 elements, then I need 12! * 70 ~ 31 GB
. If I add just one more element to the list, then it could become unfeasible to store the permutations in the RAM.
Is there any more efficient representation to keep all the permutations in the memory than the following Erlang representation?
[{dog,1},{dog,2},{cat,1},{cat,2},{bird,1},{bird,2},...]
(I know that the atom dog
is stored only once in the atoms table, but since it is repeated in every permutation, it takes N memory).
Maybe these permutations could be stored in some kind of byte representation? (Sorry, I am a newbie in bytes and binaries).
After all, it is just the same elements, but rearranged in different ways.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为什么不懒惰地生产它们呢?保留每个列表的索引,当需要新元素时,您可以即时生成组合。这样,您只需随时将源元素的初始列表加上索引存储在内存中。
例如(如果您需要迭代排列):
每次达到
index_b
的最大值时,您将其重置为0
并递增index_a
与一个。然后,访问列表的第 N 个元素(其中 N 是索引),您可以重新创建任何排列实例。当然,这意味着每次生成排列时都必须遍历列表。为了避免这种情况,您可以使用列表本身作为索引:
要生成下一个排列,请从
list_b
中弹出新元素并将其附加到list_a
的头部。如果list_b
为空,则删除list_a
的头部并通过将list_b
设置为保存在list_b_orig.
Why not produce them lazily? Keep an index from each list, and when asked for a new element, you produce the combination on the fly. That way, you only need to store the initial list of source elements in memory plus indices at any time.
For example (if you need to iterate over the permutations):
Everytime you reach the maximum of
index_b
, you reset it to0
and incrementindex_a
with one. Then, accessing the Nth element of the lists (where N is the indices) you can recreate any permutation instance.Of course, this implies that you would have to traverse the lists each time a permutation is produced. To avoid this, you could use the lists as indices themselves:
To generate the next permutation, pop the new element from
list_b
and append it to the head oflist_a
. Iflist_b
is empty, remove the head oflist_a
and start over by settinglist_b
to the original which is saved inlist_b_orig
.如果你有一个包含 N 个元素的列表,那么就有 N 个!排列。因此,如果我们能够生成从数字 1 到 N 的映射! (或 0 到 N!-1)以可重现的方式进行这些排列,我们不需要存储 N!元素列表,但只有 N!数字。
但是停下来——我们为什么要存储 N!数字?我们不需要存储它们,因为它们不会改变。我们只需要上限,它由最大元素索引(N)定义,它应该已经存储在您的代码中。
很抱歉我不了解 Erlang,但是我用 Scala 编写了一个函数算法,它允许在可重复的方式。
例如,数字(1 到 12)的第 123456790 个排列是 List (4, 2, 1, 5, 12, 7, 10, 8, 11, 9, 3, 6)。
作为一个特殊的好处,该算法以排序的方式产生排列。仅以可重现的方式查找所有排列但没有顺序更简单:
If you have a List of N elements, there are N! permutations. So if we are able to produce a mapping from the numbers 1 to N! (or 0 to N!-1) to these permutations in a reproducable way, we wouldn't need to store N! lists of elements, but only N! numbers.
But stop - why should we store N! numbers? We don't need to store them, because they don't change. We only need the upper bound, which is defined by the biggest element index, which is N, which should be stored already in your code.
I'm sorry not to know Erlang, but I wrote a functional algorithm in Scala which allows to produce permutations of arbitrary size in a reproducable manner.
For example, the 123456790th permutation of the numbers (1 to 12) is List (4, 2, 1, 5, 12, 7, 10, 8, 11, 9, 3, 6).
As a special bonus, this algorithm produces the permutations in a sorted way. Just finding all permutations in reproduceable way but without order is more simple:
也许压缩它可以完成这项工作?
Zlib 模块似乎做了类似的事情。
Maybe compressing it would do the job?
Zlib module seems to do something like this.