二维空间(矩阵)中的编码模式
我有一个 2D MxN 网格(或矩阵)。矩阵中的单元格可以保存整数。具有非零整数的单元格被称为已填充。矩阵中填充的单元格集合称为“配置”。
我想提出一种编码或散列算法,该算法允许我通过计算其编码值(应生成唯一的数字)来唯一标识矩阵中的配置。
与散列相比,我更喜欢编码,因为冲突是完全不受欢迎的。
任何人都可以建议我可以用来计算给定配置的唯一“id”的编码算法吗?
I have a 2D MxN grid (or matrix). The cells in the matrix may hold an integer. A cell with a non-zero integer is said to be populated. The set of populated cells in the matrix is known as a "configuration".
I want to come up with an encoding or hashing algorithm that wil allow me to uniquely identify a configuration in the matrix, by computing its encoded value (which should generate a unique number).
I prefer encoding to hashing, since collisions will be totally undesirable.
Can anyone suggest an encoding algorithm I can use to compute a unique "id" for a given configuration?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我建议使用哈希算法,该算法有 99.999999999% 的机会生成唯一 ID。在大多数情况下,每十亿个哈希发生一次冲突是可以接受的。我的建议是使用 CRC 算法,因为它生成高度分布式的哈希集并具有碰撞率相对较低。
I would suggest using a hashing algorithm that will have a 99.999999999% chance of generating a unique ID. In most scenarios, it is acceptable to have a collision every billionth hash. My suggestion is to use the CRC algorithm, since it generates highly distributed set of hashes and has a relatively low rate of collisions.
那么它是一个 1 和 0 的数组吗?该数组的 RLE 或 LZW 压缩怎么样?
So it's an array of 1s and 0s? How about an RLE or LZW compression of that array?
目前尚不清楚您想要实现什么,但也许是一个自定义 bloom 过滤器 实施可能适合您的问题。
It's not clear what you want to achieve but maybe a custom bloom filter implementation might be suitable to your problem.
根据您要解决的问题,我会想到以下内容:
Depending on what problem you're trying to solve, the follwoing comes to mind:
是否发生碰撞并不重要。即使存在冲突,你也可以继续检查矩阵 int by int ,看看它实际上是否相似。
只要冲突很少发生,开销就是 0,
因此散列函数可以像将所有 int 添加在一起一样简单。如果这足够了,它取决于整数的可能值以及它们的数量(因此,如果在整个矩阵中只有 1 或 2 个单元格有值,则此散列将不起作用)
It should not matter if there is a collsion or not. Even if there is a collision, you can then continue to check the matrix int by int to see if it in fact is similar.
As long as collisions happen very infrequently the overhead is 0
so a hashing function could be just as simple as to add all int's together. If this is sufficient it depends on the possible values of the ints and how many of them there are (So if in the entire matrix only 1 or 2 cells have a value, this hashing won't work)
对于提供精确比较的表示来说,不可能比表示配置的比特序列的最佳压缩做得更好。
如果您需要将 MxN 布尔值唯一编码为整数,则需要 2M*N 值。使用平台的固定精度整数是否可行取决于 M 和 N 的大小;如果不是,你将不得不使用一个位字符串或一个大整数。
由于原始数据是任何整数值而不仅仅是 1 或 0,因此朴素矩阵的朴素位串 ID 将提供
8 * sizeof (matrix::cell_type)
的压缩。针对稀疏值优化的位串可能会更好。稀疏位串的良好实现会压缩数据,因此这将减少表示的存储空间,并允许快速精确的比较,这是要求。如果保证模式稀疏到一定程度,则可以对信息进行压缩进行优化,但需要提供更多信息。
例如,您是否使用稀疏矩阵表示(带状、对角线、行压缩等),并且可以访问稀疏矩阵存储的内部结构,那么它将自然而有效地映射到压缩位串。
查看您的其他帖子,似乎矩阵被用作游戏的网格而不是矩阵。在这种情况下,最好对位串使用游程长度压缩,因为这会产生另一个有用的属性 - 其配置是彼此翻译的矩阵的编码位串表示仅在编码的第一个值上有所不同。
For a representation which affords exact comparison is not possible to do better than an optimal compression of a sequence of bits which represent the configuration.
If you require MxN boolean values to be uniquely encoded in an integer, you require 2M*N values. Whether that's feasible using your platform's fixed precision integers depends on the size of M and N; if not you'll have to use a bit string or a big integer.
Because the original data is any integer value rather than just 1 or 0, a naive bitstring ID of a naive matrix will give a compression of
8 * sizeof ( matrix::cell_type )
. A bitstring optimised for sparse values could be better. Good implementations of sparse bitstrings compress the data, so this will reduce the storage space of the representation, and allow fast exact comparison, which are the requirements.If the patterns are guaranteed sparse to a certain level, there are optimisations you can do compressing the information, but you need to give more information.
For example, are you using a sparse matrix representation ( banded, diagonal, row compressed etc ), and have access to the internals of the sparse matrix storage then that will naturally and efficiently map to a compressed bitstring.
Looking at your other post, it seems that the matrix is being used as the grid for a game rather than as a matrix. In that case it's probably best to use a run length compression on the bitstring, as this yields another useful property - the encoded bitstring representation of the matrices whose configurations are translations of each other will only differ in the first value of the encoding.