当前位置：文江博客话题详情

如何用C语言编写哈希函数？

发布于 2024-08-21 13:33:54 字数 446 浏览 11 评论 0原文

哈希表被认为是存储/检索数据的最快/最好的方式。

我对哈希表、哈希的理解如下（如果我错了请纠正我，或者如果有更多请补充）：

哈希表只不过是一个数组（单维或多维））来存储值。
散列是在数组中查找索引/位置以插入/检索数据的过程。您获取一个数据项并将其作为键传递给哈希函数，您将获得插入/检索数据的索引/位置。

我有一个问题：

用于存储/检索数据的哈希函数是否与安全应用程序中用于身份验证的加密哈希函数像 MD5、HMAC、SHA-1 等...？

它们有何不同？

如何用C语言编写哈希函数？
有一些标准或指南吗？
我们如何确保哈希函数的输出（即索引）不超出范围？

如果您能提及一些好的链接来更好地理解这些内容，那就太好了。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

岁月如刀 2024-08-28 13:33:54

加密哈希强调使任何人都难以故意制造冲突。对于哈希表，重点通常是快速生成合理分布的结果。因此，两者通常有很大不同（特别是，加密哈希通常要慢很多）。

对于典型的散列函数，结果仅受类型限制——例如，如果它返回 size_t，则它完全可以返回任何可能的 size_t。您可以将输出范围缩小到表格的大小（例如，使用除以表格大小的余数，这通常应该是素数）。

举个例子，一个相当典型的普通哈希函数可能看起来像这样：

// warning: untested code.
size_t hash(char const *input) { 

    const int ret_size = 32;
    size_t ret = 0x555555;
    const int per_char = 7;

    while (*input) { 
        ret ^= *input++;
        ret = ((ret << per_char) | (ret >> (ret_size - per_char));
   }
   return ret;
}

这里的基本思想是让输入字符串的每一位都影响结果，并且（尽快）让结果的每一位至少受到影响输入的一部分。请注意，我并不是特别推荐它作为一个很棒的哈希函数——只是试图说明您要完成的任务的一些基础知识。

A cryptographic hash emphasizes making it difficult for anybody to intentionally create a collision. For a hash table, the emphasis is normally on producing a reasonable spread of results quickly. As such, the two are usually quite different (in particular, a cryptographic hash is normally a lot slower).

For a typical hash function, the result is limited only by the type -- e.g. if it returns a size_t, it's perfectly fine for it to return any possible size_t. It's up to you to reduce that output range to the size of your table (e.g. using the remainder of dividing by the size of your table, which should often be a prime number).

As an example, a fairly typical normal hash function might look something like:

// warning: untested code.
size_t hash(char const *input) { 

    const int ret_size = 32;
    size_t ret = 0x555555;
    const int per_char = 7;

    while (*input) { 
        ret ^= *input++;
        ret = ((ret << per_char) | (ret >> (ret_size - per_char));
   }
   return ret;
}

The basic idea here is to have every bit of the input string affect the result, and to (as quickly as possible) have every bit of the result affected by at least part of the input. Note that I'm not particularly recommending this as a great hash function -- only trying to illustrate some of the basics of what you're trying to accomplish.

回复收藏 0 原文