哈希碎片和冲突(续)
对于我的这个应用程序,我觉得我可以使用 40 位哈希密钥,这看起来非常低,但看看你是否可以证实我的推理(我想要一个小密钥,因为我想要一个小文件名,并且密钥将被转换为文件名):(
注意:只有意外碰撞才值得关注 - 没有安全问题。)
这里的关键点是,所讨论的群体被分为几组,并且只有当碰撞发生在同一组内时才相关。 “组”是用户系统上的一个目录(文件的内容经过哈希处理,并且仅当冲突发生在同一目录中的文件时才相关)。因此,推测大约 100,000 个潜在用户,比如 2^17,相当于 2^18 个“组”,假设每个用户平均有 2 个目录。因此,使用 40 位密钥,我可以预期在某个用户发生冲突之前(在所有用户中)创建 2^(20+9) 个文件。 (或者 IOW 2^((40+18)/2),由于“生日效应”。)在某处某个用户发生单一冲突之前,2^17 个用户平均为每个用户创建 4096 个唯一文件。然后在另一次碰撞发生之前很久(对吧?)
For this application I've mine I feel like I can get away with a 40 bit hash key, which seems awfully low, but see if you can confirm my reasoning (I want a small key because I want a small filename and the key will be converted to a filename):
(Note: only accidental collisions a concern - no security issues.)
A key point here is that the population in question is divided into groups, and a collision is only relevant if it occurs within the same group. A "group" is a directory on a user's system (the contents of files are hashed and a collision is only relevant if it occurs for files within the same directory). So with speculating roughly 100,000 potential users, say 2^17, that corresponds to 2^18 "groups" assuming 2 directories per user on average. So with a 40 bit key I can expect 2^(20+9) files created (among all users) before a collision occurs for some user somewhere. (Or IOW 2^((40+18)/2), due to the "birthday effect".) That's an average 4096 unique files created per user, for 2^17 users, before a single collision occurs for some user somewhere. And then that long again before another collision occurs somewhere (right?)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你的数学看起来很合理,但我想知道你为什么要为此烦恼。如果您想创建唯一的文件名,为什么不为每个用户分配一个编号,并为该用户保留一个序列号。当您需要文件名时,基本上只需将用户号与序列号连接起来(两者都填充到正确的位数)。如果您觉得需要混淆这些数字,请通过 40 位加密运行该结果(这将保证唯一的输入产生唯一的输出)。
例如,如果您为每个文档分配 20 位,则可以让 220 个用户分别创建 220 个文档,而不会发生任何冲突。
如果您不介意对其进行序列化访问,则可以只使用单个 40 位计数器。这样做的优点是,单个用户不会立即用完 220 个序列号,尽管普通用户不太可能创建几乎那么多的文档。
同样,如果您认为由于某种原因需要混淆这个数字,您可以在计数器模式下使用 40 位加密算法(即使用序列号,但对其进行加密),这(再次)保证每个输入映射到唯一的输出。这可以保证不会发生冲突,直到/除非您的用户创建 240 个文档(即仅使用 40 位的最大可能值)。或者,您可以创建一个 40 位全范围线性反馈移位寄存器来创建伪随机 40 位数字。这可能安全性稍差,但优点是实施起来更快、更简单。
Your math looks reasonable, but I'm left wondering why you'd bother with this at all. If you want to create unique file names, why not just assign a number to each user, and keep a serial number for that user. When you need a file name, basically just concatentate the user number with the serial number (both padded to the correct number of digits). If you feel that you need to obfuscate those numbers, run that result though a 40-bit encryption (which will guarantee that a unique input produces a unique output).
If, for example, you assign 20 bits to each, you can have 220 users create 220 documents apiece before there's any chance of a collision at all.
If you don't mind serialized access to it, you could just use a single 40-bit counter instead. The advantage of this is that a single user wouldn't immediately use up 220 serial numbers, even though the average user is unlikely to ever create nearly that many documents.
Again, if you think you need to obfuscate this number for some reason, you can use a 40-bit encryption algorithm in counter mode (i.e. use a serial number, but encrypt it) which (again) guarantees that each input maps to a unique output. This guarantees no collision until/unless your users create 240 documents (i.e., the maximum possible with only 40 bits). Alternatively, you could create a 40-bit full-range linear feedback shift register to create your pseudo-random 40-bit numbers. This might be marginally less secure, but has the advantage of being faster and simpler to implement.