记录 IP 地址的唯一性,而不存储 IP 地址本身以保护隐私
在 Web 应用程序中,当记录一些数据时,我想确保可以识别在不同时间但来自同一 IP 地址的数据。另一方面,出于隐私考虑,因为数据将公开发布,我想确保无法检索到实际的 IP。因此,我需要某种将 IP 地址映射到其他字符串的单向映射,以确保 1-1 映射。
如果我理解正确的话,MD5、SHA1 或 SHA256 可能是一个解决方案。我想知道它们在所需的处理方面是否太昂贵?
我对任何解决方案都感兴趣,如果有 Perl 实现那就更好了。
In a web application when logging some data I'd like to make sure I can identify data that came at differetn times but from the same IP address. On the other hand for privacy concerns as the data will be released publicly I'd like to make sure the actual IP cannot be retrieved. So I need some one way mapping of the IP addresses to some other strings that ensures 1-1 mapping.
If I understand correctly then MD5, SHA1 or SHA256 could be a solution. I wonder if they are not too expensive in terms of processing needed?
I'd be interested in any solution though if there is implementation in Perl that would be even better.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我认为 MD5 会很好而且足够快。您需要添加一些盐的常量字符以避免彩虹表/网络查找。例如,字符串“127.0.0.1”的md5为f528764d624db129b32c21fbca0cb8d6,它在谷歌中的点击量相当多。另一方面,“szabgab127.0.0.1”得到“您的搜索 - 501ff2fbdca6ee72247f8c61851f17b9 - 与任何文档不匹配”(直到我发布这个答案......)
I'd think MD5 would be good and fast enough. You'd want to add a few constant characters of salt to avoid rainbow table/web lookups. For instance, the string "127.0.0.1" has md5 f528764d624db129b32c21fbca0cb8d6, which has quite a few google hits. "szabgab127.0.0.1", on the other hand, gets "Your search - 501ff2fbdca6ee72247f8c61851f17b9 - did not match any documents" (until I post this answer...)
使用Rabin 指纹识别。它快速且易于实施。
请注意,这仍然不是您所寻求的完美的哈希函数,但要获得一个您需要的可能会面临破解该函数并从哈希中获取原始 IP 的问题。在大多数情况下,指纹识别中极低的冲突几率是可以接受的。
另请注意,无论您最终使用什么哈希函数,如果您的哈希函数已知,那么查找哪些日志条目来自给定 IP 地址将是微不足道的。如果你想保护自己免受这种情况的影响,你应该加密哈希值。
Use Rabin fingerprinting. It is fast and easy to implement.
Note that this is still not a perfect hash function as you seek, but to get one you're likely going to face issues being able to crack the function and obtain the original IP from the hash. In most cases, the extremely low chance of collision in fingerprinting is acceptable.
Also note that whatever hash function you end up using, it will be trivial to find which log entries are from a given IP address if your hash function is known. If you want to secure yourself against this, you should encrypt the hash.
基于 @marcog 和 @daxim 的答案,您可以使用 HMAC,例如 HMAC-SHA< /a>,在日志生成设备上具有硬编码的密钥。如果秘密泄露,那么该计划就会变得与到目前为止给出的任何计划一样脆弱。
或者,也许更简单,您可以使用相同的密钥概念来加密 IP 地址。 AES 的 128 位块大小非常适合确保所有可能的 IP 地址的 1-1 映射。只需在 ECB 模式下使用 AES。
Building on the answers of @marcog and @daxim you could use an HMAC, for example HMAC-SHA, with a hard-coded secret key on the log generation device. If the secret leaks out, then the scheme is becomes about as weak as any of the ones given here so far.
Or, perhaps more simply, you can just use the same secret key concept to encrypt the IP address. AES's 128 bit block size is perfect for ensuring 1-1 mappings of all possible IP addresses. Just use AES in ECB mode.
如果你只使用哈希值,那么有人可以进行暴力攻击。
最简单的方法是使用布隆过滤器。特别是,http://www.afflib.org/ 上的 C++ 布隆过滤器实现允许您添加任意字符串到布隆过滤器,然后探测它们是否存在。如果您想防止暴力攻击,只需提高误报频率,使其达到十亿分之一。这样您就具有唯一性,但人们将无法找出您看到过哪些 IP 地址。
If you just use hashes, then someone can do a brute force attack.
The easiest thing to do is to use a Bloom Filter. In particular, the C++ Bloom filter implementation at http://www.afflib.org/ allows you to add arbitrary strings to the Bloom filter and then probe to see if they are present or not. If you want to protect against a brute force attack just raise your false positive frequency so it is 1 in a billion. Then you'll have uniqueness but people won't be able to figure out which IP addresses you have seen.
⚠ 不要使用 MD5 或 SHA-1 不再。 ⚠ 请参阅文章以了解其弱点。
使用加盐 SHA-2 代替,Crypt::SaltedHash 提供了一个很好的抽象。推荐的 Perl 绑定是 Digest::SHA 并使用 XS。
你说的是贵的。您已经分析过代码了吗?代码还没写?那么考虑优化还为时过早。安全必须是首要考虑的问题。
编辑:示例代码
⚠ Do not use MD5 or SHA-1 any more. ⚠ See the articles for their weaknesses.
Use salted SHA-2 instead, Crypt::SaltedHash provides a nice abstraction. The recommended Perl binding is Digest::SHA and uses XS.
You talk about expensive. Have you profiled the code yet? Code not yet written? Then it's way too early to think about optimisation. Security must be the first concern.
Edit: example code
另一个选项是 Crypt::Eksblowfish::Bcrypt。然而,它“更好”的原因恰恰是因为它令人深思熟虑——可调的成本有多高——这使得破解尝试从某种程度上到可笑的不切实际。对于您的应用程序,您可以缓存加密的 IP,这样至少在看到重复项时不会很慢。
Another option is Crypt::Eksblowfish::Bcrypt. The reason it's "better" however is precisely because it is (eks)pensive—how expensive is tunable—which makes cracking attempts anywhere from somewhat to ludicrously impractical. For your application you could cache the crypted IPs so it wouldn't be slow when duplicates were seen at least.