Java:大型持久哈希结构?
我正在java中寻找一个持久的哈希结构,一个简单的键值存储,其中键是一个唯一的字符串,值是一个int。每次将现有密钥添加到存储中时,密钥的值都会递增。
我需要这个相当大 - 可能 500m - 10 亿个密钥。我一直在评估 tokyo-cabinet http://fallabs.com/tokyocabinet/javadoc/ 但没有确定它的扩展能力如何 - 随着哈希值的增长,插入时间似乎越来越长。
关于什么可能合适的任何想法?
谢谢
编辑:为了减少磁盘 I/O,我将在内存中的 HashMap 中缓存数据,然后当缓存增长到一定大小时一次性更新持久哈希。
Edit2:持久化的原因之一是我的 RAM 有限,4GB,所以我无法将大结构放入内存中。
I'm looking for a persistent hash structure in java, a simple key-value store, where key is a unique string and value is an int. The value of a key is to be incremented each time an existing key is added to the store.
I need this to be quite large - possibly 500m - 1bn keys. I've been evaluating tokyo-cabinet http://fallabs.com/tokyocabinet/javadoc/ but not sure how well it will scale - insert times seem to be getting longer as the hash grows.
Any ideas on what might be appropriate?
Thanks
Edit: In order to reduce disk I/O I'm going to be caching data in an in-memory HashMap, then updating the persistent hash in one go when the cache grows to a certain size.
Edit2: One of the reasons for the persistence is that I have limited RAM, 4GB, so I can't fit a big struture into memory.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为 Megamap 就是您正在寻找的:http://megamap.sourceforge.net/。以下是 Megamap 主页上的简短描述:
I thing Megamap is what you are looking for: http://megamap.sourceforge.net/. Here is a short description of Megamap from its homepage:
使用数据库而不是哈希。即使对于 500M 行的数据库来说,也变得相当大了。您预计每秒有多少次更新?
Use a database not a hash. Even for a database 500M rows is getting quite large. How many updates are you expecting per second?
您检查过 Berkeley BD Java 版吗?他们有一个与集合兼容的 API(请参阅也是 StoredMap 的 Javadoc )。
Have you checked out Berkeley BD Java Edition? They have a Collections-compatible API (see also the Javadoc for StoredMap).
因此,如果我理解正确的话,Redis 可能是一个选择。您可以发出 INCR [key] 命令来自动递增与该键关联的值。如果该键不存在,则将其设置为零,然后递增(结果为一)。根据文档,INCR 是一个恒定时间操作。速度是 Redis 的主要设计目标。
Redis 能够将自身持久保存到文件中,并且您可以控制参数以了解其发生方式。
So, if I understand correctly, Redis might be an option. You can issue INCR [key] commands to atomically increment the value associated with that key. If the key does not exist, its set to zero and then incremented (resulting in one). According to the docs, INCR is a constant-time operation. Speed is a primary design goal for Redis.
Redis is able to persist itself to file, and you can control the parameters on how that happens.
我认为 Memcached 对于您的情况来说是一个不错的选择,并且在后端有一个合适的数据库。
I think Memcached is good option for your case along with a suitable database in the backend.