在 Berkeley DB Core 和 Berkeley DB JE 之间进行选择
我正在设计一个基于 Java 的 Web 应用程序,我需要一个键值存储。 Berkeley DB 似乎足够适合我,但似乎有两个 Berkeley DB 可供选择:用 C 实现的 Berkeley DB Core 和用纯 Java 实现的 Berkeley DB Java 版。
问题是,如何选择使用哪一个呢?对于网络应用程序来说,可扩展性和性能非常重要(谁知道呢,也许我的想法会成为下一个 Youtube),而且我无法轻松找到两者之间任何有意义的基准。我还没有熟悉 Cores Java API,但我很难相信它会比 Java 版本差很多,而 Java 版本似乎相当不错。
如果其他一些键值存储会更好,也请随意推荐。我正在存储较小的二进制 blob,键可能是数据的哈希值或其他一些唯一的 id。
I'm designing a Java based web-app and I need a key-value store. Berkeley DB seems fitting enough for me, but there appears to be TWO Berkeley DBs to choose from: Berkeley DB Core which is implemented in C, and Berkeley DB Java Edition which is implemented in pure Java.
The question is, how to choose which one to use? With web-apps scalability and performance is quite important (who knows, maybe my idea will become the next Youtube), and I couldn't find easily any meaningful benchmarks between the two. I have yet to familiarize with Cores Java API, but I find it hard to believe that it could be much worse than Java Editions, which seems to be quite nice.
If some other key-value store would be much better, feel free to recommend that too. I'm storing smallish binary blobs, and keys probably will be hashes of the data, or some other unique id.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我有相当多的使用 BDB-JE 和 BDB-core 与 Java 的经验。决定使用哪一个非常简单:如果您需要并发性,请使用 BDB-JE。如果您想要可扩展性,请使用 BDB-core。
由于其文件格式以及依赖 Java 垃圾收集来清理被逐出的缓存条目,BDB-JE 在大型数据库的性能方面表现不佳。预计垃圾收集会暂停很长时间,或者花费大量时间调整神奇的 GC 设置。文件格式也存在问题,因为后台清理线程必须花费大量时间来清理早期缓存驱逐产生的垃圾。如果您的数据库适合 RAM,BDB-JE 就可以很好地工作。
BDB-core依赖于页面锁定策略,高并发应用程序会遇到大量死锁。如果您可以随机排序操作,它会减少死锁的可能性,但永远不会消除它。由于 BDB 核心以更传统的方式存储数据,因此它可以扩展到超大尺寸,并导致可预测和预期的性能下降。因为它的缓存不是由垃圾收集器管理的,所以它可能非常大并且不会导致任何暂停。
I have quite a bit of experience using both BDB-JE and BDB-core with Java. Deciding which one to use is quite simple: If you want concurrency, use BDB-JE. If you want scalability, use BDB-core.
BDB-JE breaks down performance-wise with large databases due to its file format and its reliance on Java garbage collection to clean up evicted cache entries. Expect long garbage collection pauses or spend a lot of time tuning magic GC settings. The file format has issues too, because the background cleaner threads have to spend a lot of time cleaning up garbage created by early cache evictions. If your database fits in RAM, BDB-JE works quite well.
BDB-core relies on a page-locking strategy, and highly concurrent applications experience a lot of deadlocks. If you can randomly order operations it reduces the deadlock potential, but it never eliminates it. Because BDB-core stores data in a more traditional way, it scales to super large sizes with predictable and expected performance degradation. Because its cache is not managed by a garbage collector, it can be quite large and not cause any pauses.
如果您派生出这些的通用接口,并且拥有一组合适的单元测试,那么您应该能够在以后轻松地在两者之间进行交换(也许当您确实需要根据不可用的确凿事实做出决定时)现在)
If you derive a common interface to these, and have a suitable set of unit tests, you should be able to swap between the two trivially at a later date (perhaps when you really need to make a decision based on hard facts that are not available right now)
我遇到了同样的问题并决定使用 Java 版本,主要是因为它的可移植性(我需要一些甚至可以在移动设备上运行的东西)。还有直接持久层(DPL)API,并且整个数据库是单个 jar 的事实使其部署相当简单。
最近的版本 4 带来了高可用性和性能改进。还有一个事实是,长时间运行的 java 应用程序可以实现这样的优化,在某些情况下它们的性能将超越本机 C 应用程序。
它非常适合任何 Java 应用程序 - 桌面或 Web。
I faced the same problem and decided to go with the Java edition, mainly because of its portability(I need something that would ran even on mobile devices). There are also the Direct Persistence Layer (DPL) API and the fact that the whole db is a single jar makes its deployment fairly simple.
The recent version 4 brought in High availability and performance improvements. There is also the fact that long running java applications can achieve such an optimization, that they would surpass native C applications performance in some scenarios.
It's a natural fit for any Java application - desktop or web.
我不久前也遇到过同样的问题,在做了一些基准测试后,我发现本机版本中的哈希模式比 java 版本提供的任何东西都要快得多,存储效率也高得多,所以我决定采用本机实现。
我建议您对您期望的存储容量进行自己的基准测试,并确定 Java 版本是否足够快。
如果是,或者如果性能对您来说不是一个大问题(这对我来说很重要),那么就选择 Java 版本。否则,请选择本机(假设您在自己的用例中看到相同的性能提升)。
顺便提一句:
我的基准测试是测试从 20,000,000 条记录中查询随机键的速度,其中键是字符串,值是 int(4 字节)。
我发现本机版本的插入(填充基准)速度要快得多,查询速度是原来版本的两倍。
(这不是由于 Java 的缺点,而是因为 Java 版本与本机版本不是同一版本 - 4.0 与 4.8 IIRC)。
I while ago I was having the same question, after doing some benchmarks I found that hash mode in the native edition is much faster and storage efficient than anything the java edition has to offer, so I decided to go with the native implementation.
I suggest you do your own benchmarks for the storage capacities you expect and decide if the Java edition is fast enough.
if it is, or if performance is not a big issue for you (it's critical for me), just go with the Java edition. otherwise go with the native one (assuming you see the same performance boost for your own use case).
btw:
my benchmark was test the speed of querying random keys out of 20,000,000 records, where the key is a string and the value is an int (4 bytes).
I saw that inserts (populating the benchmark) was much faster with the native version, and queries was twice as fast.
(This is not due to Java shortcoming but because the Java version is not of the same version as the native version - 4.0 vs 4.8 IIRC).
我决定使用 Java 版本,只是因为它可以将数据库运行时嵌入到同一个可部署版本中。这对于我的设置来说是一个重要的功能。我还没有在 core 和 JE 之间进行基准测试,但与我在第一次评估数据库存储时测试的其他键值存储相比,我看到了出色的性能。
如果您正在创建一个 Web 应用程序,那么从长远来看,并发性可能对您非常重要。
I decided to go with the Java Edition, simply because its possible to embed the database runtime within the same deployable. This was an important feature for my setup. I haven't benchmarked between core and JE, but I have seen great performance compared with other key-value stores that I tested when first evaluating database stores.
If you're creating a web-application though, then concurrency might be very important to you in the long run.