对于 IMEI 号码和 MAC 地址的组合输入集是否存在完美的哈希函数? (C实现)
我正在寻找一个哈希函数,可以使用它为使用 GSM 调制解调器或以太网连接连接到我们网络的设备提供统一的唯一 ID。
因此,对于任何给定设备,我有一个 IMEI 号码 或 MAC 地址 硬编码,我可以用它来生成哈希值。
在过去的几个小时里,我一直在研究哈希函数,阅读我可能想要使用的不同的非加密和加密哈希。我的重点是性能上的低冲突,因为不会经常计算哈希值。
我的领先者是 MD5、FNV-1a、MurmurHash2、Hsieh 和 DJB。
无论我使用什么哈希,都必须用 C 语言实现,并且将在带有微型处理器的微控制器上使用。
我知道,选择适合您需求的良好哈希函数的技巧是知道您将为其提供哪种输入。
我问这个问题的原因是我脑海中突然闪现出一个想法,即 IMEI 和 MAC 都有有限的长度和范围,因此也许存在一个相当简单的哈希函数,可以覆盖两者的完整集合并且不会发生冲突。 (因此,这是一个完美的哈希函数)
IMEI 号码的长度为 15 位十进制数字(十六进制为 12-13 个字节?),MAC 地址为 6 个字节。仔细考虑一下,我认为两组输入数字之间不会发生冲突,但如果这是错误的,请随时纠正我。如果你这样做了,你能做些什么来阻止它吗?在其中一组中添加一些种子?
我走在正确的轨道上吗?是否有可能为这些组合集合找到完美的哈希函数?
谢谢!
更新
感谢您的回答和评论。我最终使用恒等函数;)作为我的散列函数,然后还使用掩码,因为数字集之间可能存在重叠。
IMEI、IMEISV 和 MAC 都适合 6.5 个字节或更少,因此我将我的值存储在 7 个字节中,然后使用基于数字来自哪个集合的掩码对第一个字节执行按位“或”操作,以确保它们是在所有集合中都是唯一的。
I'm looking for a hash function that I can use to give uniform unique IDs to devices that connect to our network either using a GSM modem or an ethernet connection.
So for any given device I have either an IMEI number or a MAC address hard-coded that I can use to generate the hash.
I've been researching hash functions for the last few hours, reading up on the different non-cryptographic and cryptographic hashes that I might want to use. My focus is low-collisions over performance, as the hash will not be calculated very often.
My front-runners are MD5, FNV-1a, MurmurHash2, Hsieh, and DJB.
Whatever hash I use will have to be implemented in C and will be used on a microcontroller with a tiny processor.
I know that the trick to choosing a good hash function for your needs is knowing what sort of input you're going to be feeding it.
The reason I'm asking this question is that the idea popped into my head that both IMEI and MAC have finite lengths and ranges, so perhaps there exists a fairly simple hash function that can cover the full sets of both and not have collisions. (Thus, a perfect hash function)
An IMEI number is 15 decimal digits long (12-13 bytes in hex?), and a MAC address is 6 bytes. Mulling it over I don't think you would have collisions between the two sets of input numbers, but feel free to correct me if that is wrong. If you did could you do something to prevent it? Add some seed to one of the sets?
Am I on the right track? Is finding perfect hash function for these combined sets possible?
Thanks!
Update
Thanks for the answers and comments. I ended up using the identity function ;) as my hash function, and then also using a mask since there is potential overlap across the sets of numbers.
IMEI, IMEISV, and MAC will all fit in 6.5 bytes or less, so I am storing my values in 7 bytes and then doing a bitwise OR on the first byte with a mask based on which set the number comes from, to ensure they are unique across all sets.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
没有办法对未知的、不断增长的输入集进行完美的哈希。您可以简单地使该字段比 IMEI 或 MAC 中较大的一个大一位,并使用该位来标记它是哪种类型的标识符以及整个 IMEI/MAC。任何较小的物体都会发生碰撞,但这种情况可能非常罕见。
There's no way to make a perfect hash over an unknown, growing input set. You could simply make the field one bit larger than whichever of IMEI or MAC is larger, and use that bit to flag which type of identifier it is, along with the entire IMEI/MAC. Anything smaller will have collisions, but they're probably quite rare.