记录 IP 地址的唯一性,而不存储 IP 地址本身以保护隐私

发布于 2024-10-09 05:57:33 字数 240 浏览 9 评论 0原文

在 Web 应用程序中,当记录一些数据时,我想确保可以识别在不同时间但来自同一 IP 地址的数据。另一方面,出于隐私考虑,因为数据将公开发布,我想确保无法检索到实际的 IP。因此,我需要某种将 IP 地址映射到其他字符串的单向映射,以确保 1-1 映射。

如果我理解正确的话,MD5、SHA1 或 SHA256 可能是一个解决方案。我想知道它们在所需的处理方面是否太昂贵?

我对任何解决方案都感兴趣,如果有 Perl 实现那就更好了。

In a web application when logging some data I'd like to make sure I can identify data that came at differetn times but from the same IP address. On the other hand for privacy concerns as the data will be released publicly I'd like to make sure the actual IP cannot be retrieved. So I need some one way mapping of the IP addresses to some other strings that ensures 1-1 mapping.

If I understand correctly then MD5, SHA1 or SHA256 could be a solution. I wonder if they are not too expensive in terms of processing needed?

I'd be interested in any solution though if there is implementation in Perl that would be even better.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

还如梦归 2024-10-16 05:57:33

我认为 MD5 会很好而且足够快。您需要添加一些盐的常量字符以避免彩虹表/网络查找。例如,字符串“127.0.0.1”的md5为f528764d624db​​129b32c21fbca0cb8​​d6,它在谷歌中的点击量相当多。另一方面,“szabgab127.0.0.1”得到“您的搜索 - 501ff2fbdca6ee72247f8c61851f17b9 - 与任何文档不匹配”(直到我发布这个答案......)

I'd think MD5 would be good and fast enough. You'd want to add a few constant characters of salt to avoid rainbow table/web lookups. For instance, the string "127.0.0.1" has md5 f528764d624db129b32c21fbca0cb8d6, which has quite a few google hits. "szabgab127.0.0.1", on the other hand, gets "Your search - 501ff2fbdca6ee72247f8c61851f17b9 - did not match any documents" (until I post this answer...)

甜味拾荒者 2024-10-16 05:57:33

使用Rabin 指纹识别。它快速且易于实施。

给定一个 n 位消息 m0,...,mn-1,我们
将其视为 n-1 次多项式
在有限域 GF(2) 上。

然后我们随机选择一个不可约
k 次多项式 p(x)
GF(2),我们定义指纹
m 的余数 r(x)
在 GF(2) 上将 f(x) 除以 p(x)
可以看作是多项式
k-1 度或作为 k 位数字。

请注意,这仍然不是您所寻求的完美的哈希函数,但要获得一个您需要的可能会面临破解该函数并从哈希中获取原始 IP 的问题。在大多数情况下,指纹识别中极低的冲突几率是可以接受的。

另请注意,无论您最终使用什么哈希函数,如果您的哈希函数已知,那么查找哪些日志条目来自给定 IP 地址将是微不足道的。如果你想保护自己免受这种情况的影响,你应该加密哈希值。

Use Rabin fingerprinting. It is fast and easy to implement.

Given an n-bit message m0,...,mn-1, we
view it as a polynomial of degree n-1
over the finite field GF(2).

We then pick a random irreducible
polynomial p(x) of degree k over
GF(2), and we define the fingerprint
of m to be the remainder r(x) after
division of f(x) by p(x) over GF(2)
which can be viewed as a polynomial of
degree k-1 or as a k-bit number.

Note that this is still not a perfect hash function as you seek, but to get one you're likely going to face issues being able to crack the function and obtain the original IP from the hash. In most cases, the extremely low chance of collision in fingerprinting is acceptable.

Also note that whatever hash function you end up using, it will be trivial to find which log entries are from a given IP address if your hash function is known. If you want to secure yourself against this, you should encrypt the hash.

青朷 2024-10-16 05:57:33

基于 @marcog 和 @daxim 的答案,您可以使用 HMAC,例如 HMAC-SHA< /a>,在日志生成设备上具有硬编码的密钥。如果秘密泄露,那么该计划就会变得与到目前为止给出的任何计划一样脆弱。

或者,也许更简单,您可以使用相同的密钥概念来加密 IP 地址。 AES 的 128 位块大小非常适合确保所有可能的 IP 地址的 1-1 映射。只需在 ECB 模式下使用 AES。

Building on the answers of @marcog and @daxim you could use an HMAC, for example HMAC-SHA, with a hard-coded secret key on the log generation device. If the secret leaks out, then the scheme is becomes about as weak as any of the ones given here so far.

Or, perhaps more simply, you can just use the same secret key concept to encrypt the IP address. AES's 128 bit block size is perfect for ensuring 1-1 mappings of all possible IP addresses. Just use AES in ECB mode.

够运 2024-10-16 05:57:33

如果你只使用哈希值,那么有人可以进行暴力攻击。

最简单的方法是使用布隆过滤器。特别是,http://www.afflib.org/ 上的 C++ 布隆过滤器实现允许您添加任意字符串到布隆过滤器,然后探测它们是否存在。如果您想防止暴力攻击,只需提高误报频率,使其达到十亿分之一。这样您就具有唯一性,但人们将无法找出您看到过哪些 IP 地址。

If you just use hashes, then someone can do a brute force attack.

The easiest thing to do is to use a Bloom Filter. In particular, the C++ Bloom filter implementation at http://www.afflib.org/ allows you to add arbitrary strings to the Bloom filter and then probe to see if they are present or not. If you want to protect against a brute force attack just raise your false positive frequency so it is 1 in a billion. Then you'll have uniqueness but people won't be able to figure out which IP addresses you have seen.

怼怹恏 2024-10-16 05:57:33

⚠ 不要使用 MD5SHA-1 不再。 ⚠ 请参阅文章以了解其弱点。

使用加盐 SHA-2 代替,Crypt::SaltedHash 提供了一个很好的抽象。推荐的 Perl 绑定是 Digest::SHA 并使用 XS。

你说的是贵的。您已经分析过代码了吗?代码还没写?那么考虑优化还为时过早。安全必须是首要考虑的问题。


编辑:示例代码

use Crypt::SaltedHash;
my $normalised_string_representation_of_internet_address = '::1';    # or perhaps '10.10.10.10'

# when you first get an address, make a hash and store it
my $csh = Crypt::SaltedHash->new(algorithm => 'SHA-512', salt_len => 32);
$csh->add($normalised_string_representation_of_internet_address);
my $salted = $csh->generate;

# later retrieve the hash and see whether it matches
my $valid = Crypt::SaltedHash->validate($salted, $normalised_string_representation_of_internet_address, 32);

⚠ Do not use MD5 or SHA-1 any more. ⚠ See the articles for their weaknesses.

Use salted SHA-2 instead, Crypt::SaltedHash provides a nice abstraction. The recommended Perl binding is Digest::SHA and uses XS.

You talk about expensive. Have you profiled the code yet? Code not yet written? Then it's way too early to think about optimisation. Security must be the first concern.


Edit: example code

use Crypt::SaltedHash;
my $normalised_string_representation_of_internet_address = '::1';    # or perhaps '10.10.10.10'

# when you first get an address, make a hash and store it
my $csh = Crypt::SaltedHash->new(algorithm => 'SHA-512', salt_len => 32);
$csh->add($normalised_string_representation_of_internet_address);
my $salted = $csh->generate;

# later retrieve the hash and see whether it matches
my $valid = Crypt::SaltedHash->validate($salted, $normalised_string_representation_of_internet_address, 32);
过去的过去 2024-10-16 05:57:33

另一个选项是 Crypt::Eksblowfish::Bcrypt。然而,它“更好”的原因恰恰是因为它令人深思熟虑——可调的成本有多高——这使得破解尝试从某种程度上到可笑的不切实际。对于您的应用程序,您可以缓存加密的 IP,这样至少在看到重复项时不会很慢。

Another option is Crypt::Eksblowfish::Bcrypt. The reason it's "better" however is precisely because it is (eks)pensive—how expensive is tunable—which makes cracking attempts anywhere from somewhat to ludicrously impractical. For your application you could cache the crypted IPs so it wouldn't be slow when duplicates were seen at least.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文