当前位置：文江博客话题详情

依靠哈希值进行文件识别有多安全？

发布于 2024-10-29 00:49:49 字数 601 浏览 5 评论 0原文

我正在 LAMP 堆栈之上设计一个存储云软件。

文件可以有一个内部ID，但是在服务器文件系统中不使用递增的ID作为文件名，而是使用哈希作为文件名来存储它们会有很多优点。

此外，如果当前集中式数据库应该分片或去中心化，或者应该建立某种主主高可用性环境，则哈希作为数据库中的标识符将具有很多优势。但我还不确定。

客户端可以将文件存储在任何字符串下（通常是某种路径和文件名）。

这个字符串保证是唯一的，因为在第一层是类似“存储桶”的东西，用户可以像在 Amazon S3 和 Google 存储中一样注册。

我的计划是将文件存储为客户端定义路径的哈希值。

这样，存储服务器可以直接提供文件，而不需要数据库询问它是哪个 ID，因为它可以动态计算哈希值和文件名。

但我害怕碰撞。我目前正在考虑使用 SHA1 哈希值。

我听说 GIT 也使用哈希值和修订标识符。

我知道碰撞的可能性确实非常低，但也是有可能的。

我只是无法判断这一点。您是否会依赖哈希来实现此目的？

我还可以对路径编码进行一些标准化。也许将 base64 作为文件名，但我真的不希望这样，因为它可能会变得混乱，路径可能会变得太长，并且可能会出现其他复杂情况。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

撩动你心 2024-11-05 00:49:50

假设您有一个具有“完美”属性的哈希函数，并假设加密哈希函数方法适用的理论与适用于生日攻击的理论相同。这意味着，给定最大文件数，您可以通过使用更大的哈希摘要大小来使冲突概率尽可能小。 SHA 有 160 位，因此对于任何实际数量的文件，冲突的概率几乎为零。如果您查看链接中的表格，您会发现包含 10^10 个文件的 128 位哈希的冲突概率为 10^-18 。

只要概率足够低，我认为解决方案就是好的。与行星被小行星撞击、磁盘驱动器中无法检测到的错误、内存中的位翻转等的概率相比 - 只要这些概率足够低，我们就不必担心它们，因为它们“永远不会”发生。只要留出足够的余量并确保这不是最薄弱的环节即可。

需要关注的一件事是哈希函数的选择及其可能的漏洞。是否有任何其他身份验证，或者用户是否只是提供路径并检索文件？

如果您考虑攻击者试图暴力破解上述场景，他们需要请求 2^18 个文件，然后才能获取系统中存储的其他随机文件（再次假设 128 位哈希和 10^10 个文件，您将得到文件少得多，哈希值更长）。 2^18 是一个相当大的数字，暴力破解的速度受到网络和服务器的限制。一个简单的在 x 次尝试后锁定用户的策略可以完全弥补这个漏洞（这就是许多系统实施此类策略的原因）。构建一个安全的系统很复杂，需要考虑很多点，但这种方案可以是完全安全的。

希望这有用...

编辑：思考这个问题的另一种方式是，实际上每个加密或身份验证系统都依赖于某些安全概率非常低的事件。例如，我可能很幸运，猜到了 512 位 RSA 密钥的素因数，但系统不太可能被认为非常安全。

Assuming you have a hash function with "perfect" properties and assuming cryptographic hash functions approach that the theory that applies is the same that applies to birthday attacks . What this says is that given a maximum number of files you can make the collision probability as small as you want by using a larger hash digest size. SHA has 160 bits so for any practical number of files the probability of collision is going to be just about zero. If you look at the table in the link you'll see that a 128 bit hash with 10^10 files has a collision probability of 10^-18 .

As long as the probability is low enough I think the solution is good. Compare with the probability of the planet being hit by an asteroid, undetectable errors in the disk drive, bits flipping in your memory etc. - as long as those probabilities are low enough we don't worry about them because they'll "never" happen. Just take enough margin and make sure this isn't the weakest link.

One thing to be concerned about is the choice of the hash function and it's possible vulnerabilities. Is there any other authentication in place or does the user simply present a path and retrieve a file?

If you think about an attacker trying to brute force the scenario above they would need to request 2^18 files before they can get some other random file stored in the system (again assuming 128 bit hash and 10^10 files, you'll have a lot less files and a longer hash). 2^18 is a pretty big number and the speed you can brute force this is limited by the network and the server. A simple lock the user out after x attempts policy can completely close this hole (which is why many systems implement this sort of policy). Building a secure system is complicated and there will be many points to consider but this sort of scheme can be perfectly secure.

Hope this is useful...

EDIT: another way to think about this is that practically every encryption or authentication system relies on certain events having very low probability for its security. e.g. I can be lucky and guess the prime factor on a 512 bit RSA key but it is so unlikely that the system is considered very secure.

回复收藏 0 原文