确定重复的信用卡号而不存储它们的最佳方法是什么?
我经营一个网站,我们将某些帐户标记为诈骗者,并将他们的帐户和所有使用的信用卡“标记”为不良帐户。我们不存储实际的信用卡值,而是存储其校验和/MD5 算法。
我们现在一直在发生碰撞。存储这些值的最佳方式是什么 - 不可逆,但能够对未来值进行比较。
我认为 MD5 是最好的,但我们这里正在进行辩论......
I run a website where we mark certain accounts as scammers, and "flag" their account and all credit cards used as being bad. We don't store actual credit card values, but are storing a checksum/MD5 algorithm of it instead.
We are hitting collisions all the time now. What is the best way to store these values - non reversible, but able to do comparisons on future values.
I thought MD5 would be the best, but we've got a debate going on here...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
加密安全的散列可以起作用。 (SHA512 或 SHA256 就可以)
但是,我会使用相当秘密的盐,它不与卡片一起存储(以防止任何类型的彩虹表攻击)。
PS:
针对信用卡的彩虹表攻击可能特别有效,因为由于有限的字符集、固定大小和校验位,纯文本空间的总大小非常小。
PPS:
您不能对每个条目使用随机盐,因为您永远无法切实检查重复项。盐用于防止碰撞,而在本例中我们专门寻找碰撞。
A cryptographically secure hash would work. (SHA512 or SHA256 would be OK)
However, I would use a fairly secret salt that is not stored along with the cards (to prevent any sort of rainbow table attack).
PS:
Rainbow table attacks against credit cards could be particularlly effective, since the total size of the plain-text-space is quite small due to the limited character set, the fixed size, and the check digits.
PPS:
You can't use a random salt for each entry, because you would never be able to feasibly check duplicates. Salts are used to prevent collisions, whereas we are specifically looking for a collision in this instance.
仅仅使用好的哈希算法还不够安全。如果您的列表被盗,您存储的哈希值可用于检索工作卡信息。信用卡号码的实际模式空间足够小,坚定的攻击者也可以提前预先计算许多可能的哈希值,如果存在入侵或内部作业,这可能会对您的系统产生其他影响。
我建议您使用盐,并根据涉及卡号的每个数字和第一个盐值的公式计算要添加到盐中的第二个值。这可以确保,如果您失去对任一部分的控制,您仍然拥有合理的唯一性,从而使列表的所有权变得毫无用处。不过,公式不应过多地偏向卡的前 6 位数字(BIN 号),并且公式的痕迹不应与盐或最终哈希值存储在同一位置。
考虑一下 16 位信用卡号的结构:
6 位 BIN(银行识别码)
9 位帐号
1 位 Luhn 校验和
BIN 列表在加工行业中众所周知,对于那些有权访问非法卡号列表的人来说,组装起来并不太困难。为每个发行者分配的空间进一步减少了有效 BIN 的数量。
签证 - 以 4 开头
美国运通卡 - 以 34 / 37 开头
万事达卡 - 以 5 开头
Discover/CUP - 以 6 开头
Diner's Club - 35 岁起
请
注意,每个发行者类别中分配的一些 BIN 信息也是稀疏的。如果攻击者知道您的大多数客户所在的位置,那么这将大大降低唯一性,因为 BIN 信息是按每个银行分配的。已经拥有富裕社区的一家小银行发行的账户的攻击者可以获取一个账户并使用 BIN 作为他自己卡上的起点。
校验和数字是用众所周知的公式计算的,因此可以立即将其作为唯一数据源丢弃。
有了一些值得攻击的 BIN,攻击者必须一次检查每个 BIN 集的 9 位数字。每组有 10 亿个校验和和哈希运算。我手头没有任何基准测试,但我很确定每分钟 100 万次哈希运算对于 MD5 或任何类型的 SHA 在功能适当的机器上并不是不合理的。这相当于不到一天就能破解给定 BIN 下的所有比赛。
最后,您可能会考虑将时间戳或访问者令牌(IP/子网)与哈希值一起存储。捕获重复的卡号固然很好,但也要考虑有人用虚假卡号填充您的系统的后果。在某些时候,您需要在阻止您知道无效的卡号和为自己提供识别和修复滥用的机制之间做出权衡。
例如,心怀不满的员工可能会自行窃取卡信息,然后使用您的哈希机制来攻击您,将有效的哈希值插入您的卡号黑名单中以阻止重复业务。如果您只是存储哈希值,则撤消此操作的成本相当高 - 一旦转换为哈希值,所有内容都是不透明的。考虑到这一点,也给自己一个方法来识别哈希的来源。
It isn't sufficiently safe to just use a good Hash algorithm. If your list is stolen, your stored hashes can be used to retrieve working card information. The actual schema-space for credit card numbers is small enough that a determined attacker can pre-calculate many of the possible hashes ahead of time as well, and this may have other implications for your system if there is an intrusion or an inside-job.
I recommend you use a salt and also calculate a 2nd value to be added to the salt based on a formula involving each digit of the card number and the first salt value. This assures that if you lose control of either part, you still have reasonable uniqueness that renders ownership of the list useless. The formula should not be heavily weighted toward the first 6 digits of the card (BIN number), though, and no trace of the formula should be stored in the same location as either the salt or the final hash.
Consider the anatomy of a 16-digit credit card number:
6 digit BIN (Bank Identification Number)
9 digit Account Number
1 digit Luhn Checksum
BIN lists are well known within the processing industry and are not too difficult to assemble for those with access to an illicit list of card numbers. The number of valid BINs is further diminished by the assigned space for each issuer.
Visa - Starts with 4
American Express - Starts with 34 / 37
MasterCard - Starts with 5
Discover/CUP - Starts with 6
Diner's Club - Starts with 35
etc.
Note that some of the assigned BIN information within each issuer category is also sparse. If an attacker is aware of where most of your customers are located, then that will cut down the uniqueness considerably, as BIN information is assigned on a per-bank basis. An attacker that already has an account issued by a small bank in a wealthy neighborhood could just get an account and use the BIN as a starting point on his own card.
The checksum digit is calculated with a well-known formula, so that is immediately discardable as a source of unique data.
Armed with a handful of BINs worth targeting, an attacker has to check 9 digits at a time for each BIN set. This is 1 Billion Checksums and Hash Operations per set. I don't have any benchmarks handy, but I'm pretty sure 1 Million Hash operations per minute is not unreasonable for MD5 or any flavor of SHA on a suitably powerful machine. This amounts to less than a day to crack all matches under a given BIN.
Finally, you might consider storing a timestamp or visitor token (IP/subnet) with your hashes as well. It is nice to catch duplicate card numbers, but also consider the ramifications of someone stuffing your system with bogus card numbers. At some point you need to decide on a trade-off between blocking card numbers that you know are invalid, and also give yourself a mechanism to identify and repair misuse.
For example, a disgruntled employee could be stealing card information on his own and then use your hash mechanism against you by inserting valid hashes into your card number blacklist to block repeat business. It is quite expensive to undo this if you are just storing a hash- everything is opaque once it has been converted to a hash. With this in mind, give yourself a method to identify the source of the hash as well.
也许您可以存储卡号的两个不同的哈希值。两个哈希值导致冲突的可能性几乎为零。
Perhaps you can store two different hashes of the card number. The chances that both hashes will result in collisions is practically zero.
使用SHA1,尚未发现哈希冲突。
Use SHA1, hash collisions are yet to be found.
人们指出哈希值“损坏”了,但他们没有抓住重点,也许只是在重复他们听到的东西,但不明白它的含义。当人们谈论哈希值被“破坏”时,他们通常意味着可以轻松生成计算相同哈希值的替代有效负载。
这会“破坏”哈希值,但仅用于使用哈希值验证数据是否正确的特定目的。
这在这里并不重要,即有人设法创建一个备用数据流,该数据流碰巧散列到与其中一张信用卡相同的值,但就攻击向量而言,并没有实现任何有意义或有用的东西。
这里哈希的风险在于信用卡号的问题空间相当低,并且它们的彩虹表非常便宜且易于生成。
添加盐可以对已经生成的纯卡号彩虹表增加一些保护,但它提供任何真正保护的程度取决于盐在您受到损害的情况下保持的“秘密”程度。如果盐暴露出来,那么就可以廉价地生成新的彩虹表,然后一切就结束了。
鉴于应用程序需要可以使用盐来执行针对黑名单的检查,因此破坏黑名单数据的人很有可能也能够获取盐。如果您有多个服务器,您可以通过确保盐和数据不在同一个“位置”来在一定程度上缓解这种情况,因此一台服务器的暴露不会为某人提供他们需要的所有部分。 (同样,对于备份,不要将数据和盐存储在同一介质上,这样某人就可以用一盘磁带带走并获取所有内容)。盐仅在保密时增加一些保护(在这种类型的使用中)。
如果您有足够的资源来安全地做到这一点,那么我认为这就是您要走的路线。如果您在任何合理的哈希函数上遇到大量冲突,那么您必须执行大量操作。 (事实上,我非常惊讶即使如此,碰撞也会成为一个问题,任何合理的哈希函数都应该在像这样的小问题空间上提供不同的结果)。
People pointing out that a hash is "broken" are missing the point, perhaps regurgitating something they've heard without understanding what it means. When people talk about hashes being 'broken' they typically mean that it is possible to easily generate an alternate payload that has computes to the same hash.
This 'breaks' the hash but only for the specific purpose of using a hash to verify data is what it's supposed to be.
That isn't the important here, ie someone managing to create an alternate datastream that happens to hash down to the same value as one of the credit cards doesn't achieve anything meaningful or useful in terms of an attack vector.
The risk with hashes here is that the problem space for credit card numbers is pretty low and rainbow tables for them would be pretty cheap and easy to generate.
Adding a salt would add a bit of protection against already generated rainbow tables for pure card numbers but the extent to which it offers any real protection depends on how 'secret' the salt would remain in the case you are compromised. If the salt is exposed then new rainbow tables can then be cheaply generated and it's all over.
Given that the salt needs to be available to the application for it to perform checks against the blacklist there's a good chance someone compromising the blacklist data will also be able to get to the salt. If you have multiple servers you can mitigate that to some degree by ensuring both the salt and the data aren't in the same 'place' so an exposure of one server won't give someone all of the parts they need. (Similarly for backups don't store the data and the salt on the same media where someone can walk away with one tape and get everything). The salt only adds some protection while it is secret (in this type use).
If you have the resources to do it securely then I think that is the route to go. If you are getting a significant number of collisions on any reasonable hash function you must be doing a lot of volume. (In fact I'm highly surprised collisions would be a problem even then, any reasonable hash function should provide diverse results over a small problem space like this).
正如其他人所说,HMAC 应该是正确的选择。
具有正确密钥的 HMAC-SHA-256 应该:
但是还有一件非常重要的事情:
您没有存储信用卡号码是有充分理由的。 即使如果您可以 100% 确定您使用的是正确的加密,您可能仍然不会存储信用卡号。为什么?一方面,因为密钥可能会泄露。
因此,您存储哈希值,以便无法检索信用卡号。 ...正确的?
好吧,如果您使用普通哈希,则包含所有可能的信用卡号哈希的简单彩虹表会泄露您可能未存储的所有原始数据。哎呀。但你现在已经知道了。
所以我们努力做得更好。假设使用单独的盐更好,而使用 HMAC 是我们所知的最佳方法。
考虑以下场景:
这样就剩下 5 位数字需要进行暴力破解。 这只是区区 100'000 次尝试。
如果我们使用了单独的盐,那么游戏就结束了。我们可以简单地暴力破解每个卡号,平均尝试 50,000 次。
如果我们使用了 HMAC,那么我们看起来是安全的。但请记住...我们选择不存储加密的卡号,因为即使有完美的加密,密钥也可能会泄露。你猜怎么着。我们的 HMAC 密钥同样可能被泄露。再次利用密钥,我们可以平均尝试 50,000 次来暴力破解每个卡号。因此,泄露的密钥会为我们提供信用卡号,就像我们存储加密的卡号一样。
因此,由于信用卡号码的熵较低,与加密值相比,存储哈希值并不会增加太多安全性(但 PCI 限制了加密的密钥轮换要求)。
一些观点:
好吧,我们假设这里有一个泄露的密钥。极端。但话又说回来,PCI 也是他们禁止您存储信用卡号码的理由的一部分,所以我们至少应该考虑一下。
确实,我没有考虑多次猜测来找到 BIN。不过,它应该是一个小常数。或者我们可以将自己限制为一个 BIN。
当然,PCI 审计员可能比我更宽容。
是的,如果您不存储被屏蔽的卡号,您的安全性就会提高 10,000 倍。这很有帮助。利用它来发挥你的优势。不过,如果 50K 尝试是可行的,那么 500M 也可能是可行的。在密钥泄露的情况下,这不足以让我认为数据是安全的。
结论:
使用 HMAC-SHA-256。了解风险。尽量少存放。警惕保护您的钥匙。花一大笔钱购买硬件安全模块:-)
As others have said, HMAC should be the way to go.
HMAC-SHA-256 with a proper key should:
But there is one more very important thing:
It is with good reason that you are not storing the credit card numbers. Even if you could be 100% sure that you are using proper encryption, you probably still would not store credit card numbers. Why? For one thing, because the key could be leaked.
So you store hashes, so that the credit card number cannot be retrieved. ...Right?
Well, if you use a plain hash, a simple rainbow table with hashes of all possible credit card numbers gives away all the original data that you presumably did not store. Oops. But this you knew by now.
So we try to do better. Let's say using individual salts is better, and using HMAC is the best approach we know.
Consider the following scenario:
This leaves 5 digits to be brute-forced. That is a meager 100'000 attempts.
If we have used the individual salts, it's game over. We can simply brute-force each individual card number at an average of 50'000 attempts.
If we have used HMAC, we appear to be safe. But remember... we choose not to store encrypted card numbers, because even with perfect encryption, the key could be leaked. Guess what. Our HMAC key can be leaked just the same. With the key, again, we can brute-force each individual card number at an average of 50'000 attempts. So a leaked key gives us the credit card numbers, just as it would if we had stored encrypted card numbers.
As such, because of the low entropy of credit card numbers, storing hashes does not add much security compared to encrypted values (yet PCI limits the key rotation requirement to encryption).
A bit of perspective:
Ok, we're assuming a leaked key here. Extreme. But then again, so does PCI as part of their reasoning to forbid you from storing credit card numbers, so we should at least consider it.
True, I did not take into account the multiple guesses to find the BIN. It should be a small constant, though. Or we could limit ourselves to one BIN.
Definitely, a PCI auditor may be more forgiving than I am.
Yes, if you do not store the masked card number, you are a factor 10'000 safer. This helps a lot. Use it to your advantage. Still, if 50K attempts are doable, 500M may be doable, too. It's not enough to make me consider the data secure, in the context of a compromised key.
Conclusion:
Use HMAC-SHA-256. Understand the risk. Store as little as possible. Protect your keys vigilantly. Spend a fortune on a Hardware Security Module :-)
如果您发现与 MD5 冲突,为什么不使用更好的算法,例如 SHA1 或 SHA256?
If you are finding collisions with MD5, why not use a better algorithm such as SHA1 or SHA256?
MD5 不是可行的方法,因为它已损坏。引用 Bruce Schneier 的话:“我们已经知道 MD5 是一种损坏的哈希函数”并且“没有人应该再使用 MD5”。
即使用 SHA512 或 SHA256,正如有人已经提议的那样。
MD5 is NOT the way to go since it's broken. Quote Bruce Schneier: "[w]e already knew that MD5 is a broken hash function" and that "no one should be using MD5 anymore."
I.e. use SHA512 or SHA256 as someone already proposed.
正如 Henri 上面已经提到的 (+1),正确的解决方案是使用消息身份验证代码,例如带有密钥的 HMAC。这正是之前有人提到的“秘盐”。 (顺便说一句。盐始终是公开的)。
使用 HMAC-SHA-256(RFC2104、FIPS-198a)等标准结构,保持密钥机密并将结果(身份验证标签)存储在数据库中。
SHA-256 较大的摘要大小(256 位)应该可以防止发生任何冲突,SHA-256 是一个相当好的哈希函数,随机冲突的概率为 2^-128,因此如果您在系统中遇到冲突,请告诉我! :)
As Henri already mentioned above (+1), the right solution is to use Message Authentication Code such as HMAC with a secret key. This is exactly the "secret salt" someone mentioned before. (BTW. Salts are always public).
Use standard construction such as HMAC-SHA-256 (RFC2104, FIPS-198a), keep the key secret and store the results (authentication tags) in a database.
The larger digest size (256 bits) of SHA-256 should prevent any collisions from happening, SHA-256 is a fairly good hash function and probability of random collisions is 2^-128, so if you ever encounter a collision in your system, please, let me know! :)
使用尽可能最强的哈希值通常是好的。速度并不是最重要的,缓慢的速度实际上会阻碍任何尝试暴力反转哈希值的人。
我个人喜欢 Whirlpool - 如果您使用 PHP,请查看支持的算法 哈希函数文档
Whirlpool 返回一个 128 个字符长的字符串,但您不必存储所有字符。前 32 或 64 个字符就足够了。您还可以考虑 sha512 或 sha284。
Using the strongest hash possible is usually good. Speed is not of the essence and slowness actually works against anyone trying a brute force reversal of your hashed values.
I like whirlpool, personally - if you're using PHP check out the supported algorithms at the hash function docs
Whirlpool returns a string 128 characters long, but you don't have to store all of it necessarily. The first 32 or 64 chars would suffice. You could also consider sha512 or sha284.
不用费心做盐,只需使用 HMAC。我知道这是一种滥用,但是你会得到一个像样的密钥哈希,这样你就可以防止冲突和彩虹表攻击。
这里的好处是,即使密钥泄漏,也没有人可以解密它。对于 HMAC 最有效的方法就是暴力破解。实际上,这里的关键是前面提到的盐。这里的好处是,该算法比大多数非安全程序员通常所做的加盐工作要好一些。
Dont bother doing salts, just use HMACs. I know it's kind of an abuse, but then you get a decent keyed hash, so you can prevent collisions and rainbow table attacks.
The nice thing here is that even if the key leaks, nobody can decrypt it. The best thing that works for HMACs is brute force. Actually, the key here is a salt as mentioned earlier. The nice thing here is that the algorithm is a little better than the usual salting stuff done by most non-security programmers.