帮忙设计一个哈希函数来检测重复记录？

发布于 2024-10-05 06:54:50 字数 361 浏览 3 评论 0原文

让我解释一下到目前为止我的计划。这是一个魔方解算器。我得到了一个打乱的立方体（这是初始状态）。这成为图的根节点。我正在使用迭代加深深度优先搜索来“强力”这个混乱的立方体到可识别的状态，然后我可以使用模式识别来解决。

正如您可以想象的那样，这是一个非常大的图，因此我想提出某种哈希功能来检测该图中的重复节点（从而加快遍历速度）。

我对哈希函数很不熟悉，但这就是我的想法......每个节点本质上都是魔方的不同状态。因此，如果我遇到已经看到的立方体状态（节点），我想跳过它。因此，我需要一个哈希函数，将我从状态变量转换为校验和，其中状态变量是一个 54 个字符的字符串。唯一允许的字符是 y、r、g、o、b、w（对应于颜色）。

任何设计此哈希函数的帮助将不胜感激。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦行七里 2024-10-12 06:54:50

为了最快地检测和删除重复项 - 首先避免生成许多重复位置。这很容易做到，并且比生成然后查找重复项更快。例如，如果您有像 F 和 B 这样的动作，如果您允许子序列 FB，则不允许 BF，这会产生相同的结果。如果您刚刚完成 3F，则不要在其后添加 F。您可以根据最后 3 个动作生成一个小型查找表，用于允许的下一步动作。

对于剩余的重复项，您需要快速哈希，因为有很多位置。为了让你的散列速度更快，正如其他人评论的那样，你希望它散列的内容，即位置的表示，要小。有 12 个边立方体和 8 个角立方体。表示每个立方体的位置和方向仅需要每个立方体 5 位，即总共 100 位（12.5 字节）。对于边缘，四位用于位置，一位用于翻转。对于角球，其 3 位用于位置，2 位用于旋转。您可以忽略最后一个边缘立方体，因为它的位置和翻转是由其他边缘立方体固定的。通过这种表示，您的位置已经减少到 12 个字节。

在魔方位置中大约有 70 个真实信息位，而 96 位足够接近 70 位，这使得进一步散列这些位实际上会产生相反的效果。即，将棋盘的这种表示视为您的散列。这可能听起来有点奇怪，但从你的问题来看，我设想你同时尝试使用更不紧凑的立方体表示，它更适合你的模式匹配。在这种情况下，可以将 12 字节值视为哈希，其优点是永远不会发生冲突的哈希。这使得重复的测试代码和新值插入更短、更简单、更快。它将比目前建议的 MD5 解决方案便宜。

您还可以使用许多其他技巧来减少搜索重复位置的工作。查看 http://cube20.org/ 获取想法。

回复收藏 0 原文

挖鼻大婶 2024-10-12 06:54:50

您始终可以尝试加密哈希函数。由于您的问题不是安全问题（没有攻击者故意尝试查找哈希为相同值的不同状态），因此您可以使用损坏哈希函数。我建议尝试 MD4，速度相当快。您的 54 个字符的字符串非常适合 MD4 输入（MD4 可以将最多 55 个字节的输入作为单个块处理）。

一台基本的 2.4 GHz PC 使用单个内核，通过简单的展开 C 实现（例如，类似于示例代码中的 MD4Transform() 函数），每秒可以散列大约 1200 万个这样的字符串在 RFC 1320 中）。这可能足以满足您的需求。

回复收藏 0 原文

み零 2024-10-12 06:54:50

1）不要使用哈希
魔方上有 9*6 = 54 个独立的面。即使每个面浪费 1 个字节，这也是 432 位，因此散列不会为您节省太多空间。每个面 3 位的更好封装达到 162 位（21 字节）。在我看来，你需要一种紧凑的方式来表示魔方。

OTOH，如果您想要存储一组许多以前访问过的状态，那么我发现使用布隆过滤器而不是真实的集合可以给我带来不错的结果（但通常不是最佳的），而且空间利用率要低得多。

2）如果您热衷于哈希的想法：
只需使用 MD5，它比提议的 rubik 状态稍微更紧凑，速度相当快，并且具有良好的碰撞特性 - 这不像你有一个恶意对手试图引起 rubik 立方体哈希碰撞;-)。

编辑：一旦您拥有实现算法的库或函数（例如：OpenSSL、GNU TLS 和许多独立的实现），使用加密哈希函数（例如 MD4/MD5）通常很简单。通常该函数类似于 void md5(unsigned char *buf, size_t len, unsigned char *digest)，其中 digest 指向预先分配的 16 字节缓冲区，buf 是要散列的数据（你的魔方结构）。以下是一些未经测试的 C 代码：

#include <openssl/md5.h>
void main()
{
    unsigned char digest[16];
    unsigned char buf[BUFLEN];
    initializeBuffer(buf);
    MD5(buf,BUFLEN,digest);    // This is the openssl function
    printDigest(digest);
}

并确保使用 -lssl 进行编译/链接。

1) Don't Use A Hash
You have 9*6 = 54 separate faces on a rubik cube. Even wastefully using 1 byte per face this is 432 bits, so hashing won't save you too much space. A better packing of 3 bits per face comes to 162 bits (21 bytes). It sounds to me like you need a compact way to represent the rubik.

OTOH, if you are looking to store a set of many many previously-visited states then I've found that using a bloom filter instead of a true set gets me decent results (but often non-optimal) with much lower space utilization.

2) If you are married to the idea of a hash:
Just use MD5, its slightly more compact than the proposed rubik states, rather fast, and has good collision properties - it's not like you have a malicious adversary trying to cause rubik cube hash collisions ;-).

EDIT: Using cryptographic hash functions, such as MD4/MD5, is usually simple once you have a library or function implementing the algorithm (ex: OpenSSL, GNU TLS, and many stand-alone implementations exist). Usually the function is something like void md5(unsigned char *buf, size_t len, unsigned char *digest) where digest points to a pre-allocated 16 byte buffer and buf is the data to be hashed (your rubik cube structure). Here is some untested C code:

#include <openssl/md5.h>
void main()
{
    unsigned char digest[16];
    unsigned char buf[BUFLEN];
    initializeBuffer(buf);
    MD5(buf,BUFLEN,digest);    // This is the openssl function
    printDigest(digest);
}

And be sure to compile/link with -lssl.

回复收藏 0 原文

玩物 2024-10-12 06:54:50

8 个角立方体：

您可以将每个角分配到 8 个位置，每个位置需要 3 位来确定哪个角立方体位于哪个位置，总共 24 位。
您可以进一步减少到仅记录 8 个位置中的 7 个，因为您可以轻松地使用排除过程来确定第 8 个角点是什么（对于 21 位）。
然而，这可以进一步减少，因为 8 个角只能排列在 8 中！ = 40320 排列，40320 可以用 16 位表示。

每个角锥可以正确定向或顺时针或逆时针旋转120°以处于三个不同的位置（分别表示为0、1和2）。

这需要每个角 2 位来表示。
然而，方向之和（模 3）始终为 0；因此，如果您知道 8 个方向中的 7 个（假设您有一个可解的立方体），您可以计算第 8 个角的方向（总共 14 位）。
或者为了进一步减少，七个三进制（基数为 3）数字可以表示角的方向，并且这可以用 12 个二进制数字（位）来表示。

因此，如果您想解码排列，角立方体可以用 28 位表示；如果您想直接记录 8 个角中的 7 个角的位置，则可以用 33 位表示。

12 个边立方体：

每个边立方体可以用 4 位表示（总共 48 位），通过仅记录 12 个边中的 11 个边的位置（总共 48 位）可以将其减少到 44 位。 44 位）。
然而，12！ = 479001600 边的排列可以存储在 29 位中。

每条边可以正确定向或翻转：

这需要 1 位来表示。
然而，边缘总是成对翻转，因此翻转边缘的奇偶校验始终为零（同样，这意味着您只需要记录边缘的 12 个方向中的 11 个），总共需要 11 位。

因此，如果您想解码排列，边立方体可以用 40 位表示；如果您想记录 12 个边中的 11 个边的所有位置和翻转，则可以用 55 位表示。

6 个中心立方体

您不需要记录有关中心立方体的任何信息 - 它们相对于魔方中心的球是固定的（因此假设您不担心任何徽标的方向立方体上）是不动的。

总计：

使用排列：68 位
使用位置：88 位

8 corner cubes:

You can assign each of these corners to 8 positions which each require 3 bits to determine which corner cube is at which position for a total of 24 bits.
You can further reduce this to just recording 7-of-8 positions as you can easily use a process of elimination to determine what the 8th corner is (for 21 bits).
However, this can be reduced further as the 8 corners can only be arranged in 8! = 40320 permutations and 40320 can be represented in 16 bits.

Each corner cube can be orientated correctly or be rotated 120° clockwise or anti-clockwise to be in three different positions (represented as 0, 1 and 2 respectively).

This requires 2 bits per corner to represent.
However, the sum of the orientations (modulo 3) is always 0; so, if you know 7-of-8 orientations then (assuming you have a solvable cube) you can calculate the orientation of the 8th corner (giving a total of 14 bits).
Or for a further reduction, seven ternary (base 3) digits can represent the orientation of the corners and this can be represented in 12 binary digits (bits).

So the corners cubes can be represented in 28 bits, if you want to decode the permutations, or in 33 bits, if you want to directly record the positions of 7-of-8 corners.

12 edge cubes:

Each can be represented in 4 bits (for a total of 48 bits) which can be reduced to 44 bits by only recording the position of 11-of-12 edges (for a total of 44 bits).
However, the 12! = 479001600 permutations of the edges can be stored in 29 bits.

Each edge can be either be oriented correctly or flipped:

This requires 1 bit to represent.
However, edges are always flipped in pairs so the parity of the flipped edges will always be zero (again, meaning that you only need to record 11-of-12 orientations for the edges) giving a total of 11 bits required.

So edge cubes can be represented in 40 bits, if you want to decode the permutations, or in 55 bits if you want to record all the positions and flips of 11-of-12 edges.

6 centre cubes

You do not need to record any information about the centre cubes - they are fixed relative to the ball at the centre of the Rubik's cube (so assuming you are not worried about the orientation of any logos on the cube) are immobile.

Total: