推荐用于 Java 实现的低内存哈希图
我目前正在研究一个与编程相关的问题,我试图制作一个大量的数据哈希图。数据的关键是 CharSequence 的自定义低内存实现,它实现了 hashCode() 和 equals(...),值是 Integer 对象。
这个哈希表中可能有数百万个条目,我通过让整数成为文件中指向我希望哈希的数据的指针,设法大幅减少该值的内存使用,但问题是键可能是数十个字节(平均 25 个字节),并且在 HashMap 的默认实现中,键需要保存在内存中。
我需要一个内存开销较低的哈希图,并且可以将键分页到磁盘或存储键的哈希表示。如果密钥本身经过哈希处理,那么我会担心哈希冲突。
理想情况下,我希望能够在每 50MB 堆空间的映射中存储一百万个条目(键中包含 25 个字节的一个字节数组,值部分中包含 Integer 对象)。
有没有人有过使用低内存文件系统支持的地图的经验,这些地图经过优化以减少密钥的占用空间?
谢谢,
克里斯
I am currently working on a programming related problem where I am attempted to make a massive hashmap of data. The key for the data is a custom low-memory implementation of a CharSequence that implements hashCode() and equals(...) and the value is am Integer object.
There may be millions of entries in this hashtable and I managed to drastically reduce memory use for the value by having the Integer be a pointer in a file to the data I wish to hash but thbe problem is that the key may be tens of bytes (on average 25 bytes) and that the keys need to be held in memory in the default implementation of HashMap.
I need a hashmap that has a low memory overhead and that can possibly page the keys to disk or alternatively store a hashed representation of the keys. If the keys are themselves hashed then I would be concerned about hash collisions.
Ideally, I would like to be able to store a million entries in the map per 50MB of heap space (one byte array of 25 bytes in the key and Integer object in the value part).
Does anyone have any experience with low-memory filesystem-backed Maps that are optimised for reducing the footprint of the keys?
Thanks,
Chris
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用 Java 的哈希映射并编写一个 FileKey 类,该类采用 RandomAccessFile、偏移量和长度,在构造时预先计算哈希,并通过从文件中读取数据来实现 Comparable 以便进行比较。
与简单的 MRU 缓存结合使用,您可以使用另一个哈希图在内存中保留一定数量的键,该哈希图以相同的键为键,但它使用自定义比较器,仅比较偏移量和长度值(而不是文件数据)。
You could use Java's hash map and write a FileKey class that takes a RandomAccessFile, offset and length, precomputes the hash at construction and which implements Comparable by reading the data from the file just for the compare.
In conjunction with a simple MRU cache, you could keep some number of keys in memory using another hashmap which is keyed on the same keys, but which uses a custom comparator which compares just the offset and length values (not the file data).
Berkeley DB Java 版怎么样?它的 StoredMap课程看起来像您正在寻找的课程。
How about Berkeley DB Java Edition? Its StoredMap class looks like what you are looking for.
我认为默认的
HashSet
并不是一个坏方法——自己创建键值对(这样你就不必将它们包装在额外的对象中)。这样可以非常节省内存;它实际上只需要在关键对象之上增加大约 (1/loadFactor)^(3/2)*4 个字节的内存 + 4 个字节的值。实际上,这应该为每个条目增加 8 个字节的开销。 (如果您事先知道要存储多少个密钥,则可以进一步减少此值。)I think that the default
HashSet
is not a bad way to go--make the key-value pair yourself (so you don't have to wrap them in an extra object). It is pretty memory-efficient that way; it really only requires about (1/loadFactor)^(3/2)*4 bytes more memory on top of your key object + 4 bytes for the value. In practice, this should add something like 8 bytes of overhead per entry. (You can reduce this further if you know in advance how many keys you're going to store.)