在java对象中存储大十六进制数(md5)的最有效方法
考虑以下用例,将文件的 MD5
总和存储在 java(或 groovy)对象中的最有效方法(性能和存储空间最佳)是什么:
- 我需要与数千个进行比较其他 md5 和。
- 我可能需要将其存储在 HSQLDB 中,以便可以根据 md5 提取记录/
group by
- 可以将其存储在
Map
中作为
我试图避免存储 的键它作为 String
作为字符串比较将更加昂贵并且占用更多空间。 BigInteger(string,radix) 会更高效吗?另外,如果持久化到数据库应该选择什么数据类型?
What will be the most efficient way (optimal for performance and storage space) to store the MD5
sum of file in a java (or groovy) object considering the following use-cases:
- I need to compare with thousands of other md5 sums.
- I may need to store this in HSQLDB, so that records can be pulled/
group by
based on md5 - May be stored in
Map
's as keys
I am trying to avoid storing it as String
as String comparisons will be more costly and take more space. Will BigInteger(string,radix)
be more efficient? Also, what datatype should be selected if persisting in database?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
创建一个包装
byte[]
且不提供突变的类。如果您想将其用作映射中的键,那么它需要具有可比性或具有哈希码。使用byte[]
,您可以更轻松地从前 32 位计算简单的哈希码。Create a class that wraps a
byte[]
and provides no mutation. If you want to use it as a key in a map, then it needs to either be comparable, or have a hash code. With abyte[]
you'll have an easier time computing a simple hashcode from the first 32 bits.为了在 Java 中比较速度,将其存储为两个
long
值可能是最快的。对于持久性来说,如果您的数据库和持久性工具支持的话,存储为字节数组是最有意义的。否则,存储为十六进制或 Base-64 编码文本是相当常见的,并且可以与访问同一数据库的其他应用程序良好地互操作。For comparison speed in Java, storing it as two
long
values will likely be fastest. For persistence, storage as a byte array makes the most sense, if your database and persistence tools support it. Otherwise, storage as hexadecimal or Base-64–encoded text is fairly common and will inter-operate well with other applications that access the same database.如果需要执行大量比较,可以将 MD5 值存储为 2 个长整数,这样您最多只需要执行 4 次逻辑运算即可与另一个 MD5 值进行检查。
基本上,提供一个接受输入的类,原始摘要数据为
byte[]
并使用与另一个
long[]
MD5 数组进行比较,并使用Reconstruct the MD5 with
注意:我并不建议每次比较时都将
byte[]
转换为long[]
,这只是存储摘要的方法进行比较。最后一个重建片段是可选的,您应该将数据保留为byte[]
并仅比较long[]
数组。在数据库中,将数据存储为 32 字节的十六进制值。If you need to perform a lot of comparisons, you could store the MD5 value as 2
long
integers, that way you only need to perform at most 4 logical operations to check against another MD5 value.Basically, provide a class that will accept an input, a raw digest data as
byte[]
and useCompare with another
long[]
MD5 array withReconstruct the MD5 with
Note : I am not suggesting to convert the
byte[]
intolong[]
for every comparisons, this is simply how to store the digest for comparisons. The last reconstruction snippet is optional, you should keep the data asbyte[]
and compare thelong[]
arrays only. In the database, store the data as a 32 bytes hexadecimal value.