具有多个值的最快可能的键->值光盘存储
我正在寻找一种有效的方法来存储许多键->值对 保存在磁盘上以实现持久性,最好进行一些缓存。
所需的功能是添加值(连接) 对于给定的键或让模型成为键 ->值列表, 两种选择都很好。值部分通常是二进制文档。
在这种情况下我不会过多使用集群、冗余等。
在语言方面,我们使用 java,并且我们在经典数据库(Oracle、MySQL 等)方面拥有丰富的经验。
我看到了几个明显的场景,希望得到一些建议 就每秒存储(和检索)而言最快:
1)通过标准插入将数据存储在经典数据库表中。
2)自己使用文件系统树传播到许多文件, 每个键一个或多个。
3)使用一些众所周知的元组存储。一些明显的候选者是: 3a) Berkeley db java 版 3b) 现代 NoSQL 解决方案,如 cassandra 和类似的解决方案
我个人喜欢使用 Berkely DB JE 来完成我的任务。
总结我的问题:
鉴于上述情况,伯克利似乎是一个明智的选择吗?
对于某些操作,我期望的速度是多少,例如 更新(插入、添加键的新值)和 检索给定的密钥?
I'm looking for a efficient way to store many key->value pairs
on disc for persistence, preferably with some caching.
The features needed are to either add to the value (concatenate)
for a given key or to let the model be key -> list of values,
both options are fine. The value-part is typically a binary document.
I will not have too much use of clustering, redundancy etc in this scenario.
Language-wise we're using java and we are experienced in classic databases (Oracle, MySQL and more).
I see a couple of obvious scenarios and would like advice on what
is fastest in terms of stores (and retrievals) per second:
1) Store the data in classic db-tables by standard inserts.
2) Do it yourself using a file system tree to spread to many files,
one or several per key.
3) Use some well known tuple-storage. Some obvious candidates are:
3a) Berkeley db java edition
3b) Modern NoSQL-solutions like cassandra and similar
Personally I like the Berkely DB JE for my task.
To summarize my questions:
Does Berkely seem like a sensible choice given the above?
What kind of speed can I expect for some operations, like
updates (insert, addition of new value for a key) and
retrievals given key?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您还可以尝试 Chronicle Map 或 JetBrains Xodus 都是 Java 嵌入式键值存储比 Berkeley DB JE 快得多(如果您确实追求速度)。 Chronicle Map 提供了一个易于使用的
java.util.Map
界面。You could also give a try to Chronicle Map or JetBrains Xodus which are both Java embeddable key-value stores much faster than Berkeley DB JE (if you are really looking for speed). Chronicle Map provides an easy-to-use
java.util.Map
interface.BerkeleyDB 听起来很明智。 Cassandra 也是明智的,但如果您不需要冗余、集群等,则可能有点过分了。
也就是说,单个 Cassandra 节点每秒可以处理 20k 写入(前提是您使用多个客户端来利用 Cassandra 中的高并发性)硬件适中。
BerkeleyDB sounds sensible. Cassandra would also be sensible but perhaps is overkill if you don't need redundancy, clustering etc.
That said, a single Cassandra node can handle 20k writes per second (provided that you use multiple clients to exploit the high concurrency within Cassandra) on relatively modest hardware.
FWIW,我正在使用 Ehcache ,性能完全令人满意;我从未尝试过 Berkeley DB。
FWIW, I'm using Ehcache with completely satisfactory performance; I've never tried Berkeley DB.
Berkeley DB JE 应该可以很好地适合您描述的用例。性能会有所不同,很大程度上取决于每个操作需要多少 I/O(以及推论——可用缓存有多大)以及您为写入事务定义的持久性约束(即提交事务是否必须是否全部写入磁盘)?
一般来说,我们通常会在使用 BDB JE 的商用硬件上看到每秒 50-100K 读取和每秒 5-12K 写入。显然,YMMV。
有关 BDB JE 的性能调优和吞吐量问题最好在 Berkeley DB JE 论坛 上询问,那里有一个活跃的社区BDB JE 应用程序开发人员随时提供帮助。 BDB JE FAQ 中提供了一些有用的性能调整建议,这些建议也可能会派上用场。
祝您实施顺利。如果我们可以提供帮助,请告诉我们。
此致,
Dave - Berkeley DB 产品经理
Berkeley DB JE should work just fine for the use case that you describe. Performance will vary, largely depending on how many I/Os are required per operation (and the corollary -- how big is the available cache) and on the durability constraints that you define for your write transactions (ie. does a commit transaction have to write all the way to the disk or not)?
Generally speaking, we typically see 50-100K reads per second and 5-12K writes per second on commodity hardware with BDB JE. Obviously, YMMV.
Performance tuning and throughput questions about BDB JE are best asked on the Berkeley DB JE forum, where there is an active community of BDB JE application developers on hand to help out. There are several useful performance tuning recommendations in the BDB JE FAQ which may also come in handy.
Best of luck with your implementation. Please let us know if we can help.
Regards,
Dave -- Product Manager for Berkeley DB