BerkeleyDB写入性能问题
我需要一个基于磁盘的键值存储,它可以维持大型数据集的高写入和读取性能。我知道,这是一个艰巨的任务。
我正在尝试来自 java 的 C BerkeleyDB (5.1.25) 库,但我发现了严重的性能问题。
我在短时间内获得了稳定的 14K 文档/秒,但是一旦达到几十万个文档,性能就会像石头一样下降,然后恢复一段时间,然后再次下降,等等。这种情况发生得越来越频繁,到目前为止,大多数时候我无法获得超过 60 个文档/秒,在 1000 万个文档之后,有一些 12K 文档/秒的孤立峰值。我选择的数据库类型是 HASH,但我也尝试过 BTREE,结果是一样的。
我尝试使用 10 个 db 的池并对其中的文档进行哈希处理以消除性能下降;这将写入吞吐量提高到了 50K 文档/秒,但对性能下降没有帮助:所有 10 个数据库同时速度减慢。
我认为这些文件正在被重组,并且我试图找到一个影响重组发生时间的配置参数,因此每个池数据库都会在不同的时间进行重组,但我找不到任何有效的东西。我尝试了不同的缓存大小,使用 setHashNumElements 配置选项保留空间,这样就不会花费时间来增长文件,但每次调整都会使情况变得更糟。
我即将放弃 berkeleydb 并尝试更复杂的解决方案,例如 cassandra,但我想确保在注销之前我没有在 berkeleydb 中做错什么。
这里有人有使用 berkeleydb 实现持续写入性能的经验吗?
编辑 1:
我已经尝试了几件事:
- 将写入速度限制为 500/s(低于我在 15 小时内写入 3000 万个文档后得到的平均值,这表明硬件能够写入 550 个文档) /秒)。不起作用:一旦编写了一定数量的文档,性能就会下降。
- 将传入的项目写入队列。这有两个问题:A)它违背了释放内存的目的。 B) 队列最终会阻塞,因为 BerkeleyDB 冻结的时间变得更长且更频繁。
换句话说,即使我限制传入数据以保持低于硬件能力并使用内存来保存项目,而 BerkeleyDB 需要一些时间来适应增长,随着时间变得越来越长,性能接近 0。
这让我感到惊讶,因为我有人声称它可以处理 TB 级的数据,但我的测试表明并非如此。我仍然希望我做错了什么...
编辑 2:
经过更多思考并根据 Peter 的输入,我现在了解到,随着文件变大,一批写入将会传播距离越远,它们落入同一磁盘柱面的可能性就会下降,直到最终达到磁盘的寻道/秒限制。
但 BerkeleyDB 的定期文件重组比这更早地破坏了性能,而且以一种更糟糕的方式:它在重新整理内容时停止响应的时间越来越长。使用更快的磁盘或将数据库文件分布在不同的磁盘上并没有帮助。我需要找到解决这些吞吐量漏洞的方法。
I need a disk-based key-value store that can sustain high write and read performance for large data sets. Tall order, I know.
I'm trying the C BerkeleyDB (5.1.25) library from java and I'm seeing serious performance problems.
I get solid 14K docs/s for a short while, but as soon as I reach a few hundred thousand documents the performance drops like a rock, then it recovers for a while, then drops again, etc. This happens more and more frequently, up to the point where most of the time I can't get more than 60 docs/s with a few isolated peaks of 12K docs/s after 10 million docs. My db type of choice is HASH but I also tried BTREE and it is the same.
I tried using a pool of 10 db's and hashing the docs among them to smooth out the performance drops; this increased the write throughput to 50K docs/s but didn't help with the performance drops: all 10 db's slowed to a crawl at the same time.
I presume that the files are being reorganized, and I tried to find a config parameter that affects when this reorganization takes place, so each of the pooled db's would reorganize at a different time, but I couldn't find anything that worked. I tried different cache sizes, reserving space using the setHashNumElements config option so it wouldn't spend time growing the file, but every tweak made it much worse.
I'm about to give berkeleydb up and try much more complex solutions like cassandra, but I want to make sure I'm not doing something wrong in berkeleydb before writing it off.
Anybody here with experience achieving sustained write performance with berkeleydb?
Edit 1:
I tried several things already:
- Throttling the writes down to 500/s (less than the average I got after writing 30 million docs in 15 hors, which indicates the hardware is capable of writing 550 docs/s). Didn't work: once a certain number of docs has been written, performance drops regardless.
- Write incoming items to a queue. This has two problems: A) It defeats the purpose of freeing up ram. B) The queue eventually blocks because the periods during which BerkeleyDB freezes get longer and more frequent.
In other words, even if I throttle the incoming data to stay below the hardware capability and use ram to hold items while BerkeleyDB takes some time to adapt to the growth, as this time gets increasingly longer, performance approaches 0.
This surprises me because I've seen claims that it can handle terabytes of data, yet my tests show otherwise. I still hope I'm doing something wrong...
Edit 2:
After giving it some more thought and with Peter's input, I now understand that as the file grows larger, a batch of writes will get spread farther apart and the likelihood of them falling into the same disk cylinder drops, until it eventually reaches the seeks/second limitation of the disk.
But BerkeleyDB's periodic file reorganizations are killing performance much earlier than that, and in a much worse way: it simply stops responding for longer and longer periods of time while it shuffles stuff around. Using faster disks or spreading the database files among different disks does not help. I need to find a way around those throughput holes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我所看到的高磁盘写入率是系统缓存将被填满(到那时为止提供闪电性能),但是一旦它填满应用程序,甚至整个系统也会急剧减慢,甚至停止。
您的底层物理磁盘应至少维持每秒 100 次写入。除此之外,更多的是由更清晰的缓存支持的幻想。 ;) 但是,当缓存系统耗尽时,您会看到非常糟糕的行为。
我建议您考虑磁盘控制器缓存。其电池备份内存需要与您的数据大小相当。
另一种选择是,如果更新是突发性的,则使用 SSD 驱动器(它们每秒可以执行 10K+ 写入,因为它们没有移动部件)并带有缓存,这应该会为您提供超出您需要的容量,但 SSD 的写入次数有限。
What I have seen with high rates of disk writes is that the system cache will fill up (giving lightening performance up to that point) but once it fills the application, even the whole system can slow dramatically, even stop.
Your underlying physical disk should sustain at least 100 writes per second. Any more than that is an illusion supported by clearer caching. ;) However, when the caching system is exhausted, you will see very bad behaviour.
I suggest you consider a disk controller cache. Its battery backed up memory would need to be about the size of your data.
Another option is to use SSD drives if the updates are bursty, (They can do 10K+ writes per second as they have no moving parts) with caching, this should give you more than you need but SSD have a limited number of writes.
BerkeleyDB 不会执行文件重组,除非您手动调用压缩实用程序。速度减慢有多种原因:
当您说“文档”时,您的意思是说您正在使用 BDB 来存储大于几 kbyte 的记录吗? BDB 溢出页面具有更多开销,因此您应该考虑使用更大的页面大小。
BerkeleyDB does not perform file reorganizations, unless you're manually invoking the compaction utility. There are several causes of the slowdown:
When you say "documents", do you mean to say that you're using BDB for storing records larger than a few kbytes? BDB overflow pages have more overhead, and so you should consider using a larger page size.
这是一个老问题,问题可能已经消失了,但我最近遇到了类似的问题(几十万条记录后插入速度急剧下降),并且通过为数据库提供更多缓存(DB-> set_cachesize)来解决它们。使用 2GB 缓存时,插入速度非常好,并且在达到 1000 万条记录时或多或少保持不变(我没有进一步测试)。
This is an old question and the problem is probably gone, but I have recently had similar problems (speed of insert dropping dramatically after few hundred thousand records) and they were solved by giving more cache to the database (DB->set_cachesize). With 2GB of cache the insert speed was very good and more or less constant up to 10 million records (I didn't test further).
我们在工作中使用过 BerkeleyDB (BDB),并且具有相似的性能趋势。 BerkeleyDB 使用 Btree 来存储其键/值对。当条目数量不断增加时,树的深度就会增加。 BerkeleyDB 缓存用于将树加载到 RAM 中,这样树遍历就不会产生文件 IO(从磁盘读取)。
We have used BerkeleyDB (BDB) at work and have seem similar performance trends. BerkeleyDB uses a Btree to store its key/value pairs. When the number of entries keep increasing, the depth of the tree increases. BerkeleyDB caching works on loading trees into RAM so that a tree traversal does not incur file IO (reading from disk).
Chronicle Map 是此任务的现代解决方案。它在读取和写入方面都比 BerkeleyDB 快得多,并且在以下方面更具可扩展性来自多个线程/进程的并发访问。
Chronicle Map is a modern solution for this task. It's much faster than BerkeleyDB on both reads and writes, and is much more scalable in terms of concurrent access from multiple threads/processes.