BerkeleyDB 并发

发布于 2024-07-03 23:37:33 字数 230 浏览 8 评论 0原文

  • BerkeleyDB 的 C++ 实现可以合理支持的最佳并发级别是多少?
  • 在吞吐量因资源争用而开始受到影响之前,我可以使用多少个线程来处理数据库?

我已经阅读了手册并知道如何设置锁的数量、储物柜、数据库页面大小等,但我只想从具有 BDB 并发实际经验的人那里得到一些建议。

我的应用程序非常简单,我将获取和放置每条大约 1KB 的记录。 没有光标,没有删除。

  • What's the optimal level of concurrency that the C++ implementation of BerkeleyDB can reasonably support?
  • How many threads can I have hammering away at the DB before throughput starts to suffer because of resource contention?

I've read the manual and know how to set the number of locks, lockers, database page size, etc. but I'd just like some advice from someone who has real-world experience with BDB concurrency.

My application is pretty simple, I'll be doing gets and puts of records that are about 1KB each. No cursors, no deleting.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

森林迷了鹿 2024-07-10 23:37:34

据我了解,Samba 创建了 tdb 来允许“多个并发编写者”对于任何特定的数据库文件。 因此,如果您的工作负载有多个编写器,您的性能可能会很差(例如,Samba 项目选择编写自己的系统,显然是因为它对 Berkeley DB 在这种情况下的性能不满意)。

另一方面,如果您的工作负载有很多读者,那么问题是您的操作系统处理多个读者的能力如何。

The way I understand things, Samba created tdb to allow "multiple concurrent writers" for any particular database file. So if your workload has multiple writers your performance may be bad (as in, the Samba project chose to write its own system, apparently because it wasn't happy with Berkeley DB's performance in this case).

On the other hand, if your workload has lots of readers, then the question is how well your operating system handles multiple readers.

一片旧的回忆 2024-07-10 23:37:34

在处理性能未知的数据库时,我所做的就是测量查询的周转时间。 我不断增加线程计数,直到周转时间下降,并减少线程计数,直到周转时间改善(好吧,这是我的环境中的进程,但无论如何)。

其中涉及移动平均线和各种指标,但要点是:适应当前的情况。 您永远不知道 DBA 何时会提高性能或硬件会升级,或者可能会在您运行时出现另一个进程来降低系统负载。 所以适应吧。

哦,还有一件事:如果可以的话,避免流程切换 - 将事情批量化。


哦,我应该澄清这一点:这一切都发生在运行时,而不是在开发期间。

What I did when working against a database of unknown performance was to measure turnaround time on my queries. I kept upping the thread count until turn-around time dropped, and dropping the thread count until turn-around time improved (well, it was processes in my environment, but whatever).

There were moving averages and all sorts of metrics involved, but the take-away lesson was: just adapt to how things are working at the moment. You never know when the DBAs will improve performance or hardware will be upgraded, or perhaps another process will come along to load down the system while you're running. So adapt.

Oh, and another thing: avoid process switches if you can - batch things up.


Oh, I should make this clear: this all happened at run time, not during development.

梦太阳 2024-07-10 23:37:34

我强烈同意 Daan 的观点:创建一个测试程序,并确保它访问数据的方式尽可能地模仿您期望应用程序具有的模式。 这对于 BDB 来说非常重要,因为不同的访问模式会产生截然不同的吞吐量。

除此之外,我发现这些是对吞吐量有重大影响的一般因素:

  1. 访问方法(在您的情况下我猜是 BTREE)。

  2. 您配置 DBD 的持久性级别(例如,在我的例子中,“DB_TXN_WRITE_NOSYNC”环境标志将写入性能提高了一个数量级,但它会损害持久性)

  3. 工作集适合缓存吗?

  4. 阅读次数对比。 写入次数。

  5. 您的访问的分散程度如何(请记住 BTREE 具有页面级锁定 - 因此使用不同的线程访问不同的页面是一个很大的优势)。

  6. 访问模式 - 意味着线程相互锁定甚至死锁的可能性有多大,以及您的死锁解决策略是什么(这可能是一个杀手)。

    访问

  7. 硬件(用于缓存的磁盘和内存)。

这相当于以下几点:
扩展基于 DBD 的解决方案以提供更大的并发性有两种关键方法: 要么最大限度地减少设计中的锁数量,要么添加更多硬件。

I strongly agree with Daan's point: create a test program, and make sure the way in which it accesses data mimics as closely as possible the patterns you expect your application to have. This is extremely important with BDB because different access patterns yield very different throughput.

Other than that, these are general factors I found to be of major impact on throughput:

  1. Access method (which in your case i guess is BTREE).

  2. Level of persistency with which you configured DBD (for example, in my case the 'DB_TXN_WRITE_NOSYNC' environment flag improved write performance by an order of magnitude, but it compromises persistency)

  3. Does the working set fit in cache?

  4. Number of Reads Vs. Writes.

  5. How spread out your access is (remember that BTREE has a page level locking - so accessing different pages with different threads is a big advantage).

  6. Access pattern - meanig how likely are threads to lock one another, or even deadlock, and what is your deadlock resolution policy (this one may be a killer).

  7. Hardware (disk & memory for cache).

This amounts to the following point:
Scaling a solution based on DBD so that it offers greater concurrency has two key ways of going about it; either minimize the number of locks in your design or add more hardware.

吲‖鸣 2024-07-10 23:37:34

这取决于您正在构建什么类型的应用程序。 创建一个有代表性的测试场景,然后开始努力。 然后你就会知道最终的答案。

除了您的用例之外,它还取决于 CPU、内存、前端总线、操作系统、缓存设置等。

说真的,只需测试您自己的场景即可。

如果您需要一些数字(实际上在您的场景中可能没有任何意义):

It depends on what kind of application you are building. Create a representative test scenario, and start hammering away. Then you will know the definitive answer.

Besides your use case, it also depends on CPU, memory, front-side bus, operating system, cache settings, etcetera.

Seriously, just test your own scenario.

If you need some numbers (that actually may mean nothing in your scenario):

若水微香 2024-07-10 23:37:34

这不是取决于硬件以及线程数量等吗?

我会做一个简单的测试,并用越来越多的线程锤击来运行它,看看什么看起来最好。

Doesn't this depend on the hardware as well as number of threads and stuff?

I would make a simple test and run it with increasing amounts of threads hammering and see what seems best.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文