适用于大量条目的最佳 C 语言键/值数据库

发布于 2024-12-01 16:07:21 字数 376 浏览 9 评论 0原文

我正在尝试创建一个键/值数据库,其中包含 300,000,000 个键/值对,每个键/值对 8 字节(键和值均包含)。要求是有一个非常快的键/值机制,每秒可以查询大约 500,000 个条目。

我尝试了 BDB、Tokyo DB、Kyoto DB 和 levelDB,当涉及到这种大小的数据库时,它们的性能都非常糟糕。 (他们的表现甚至还没有接近 1,000,000 个条目的基准率)。

由于硬件限制(32 位软件),我无法将数据库存储在内存中,因此 memcached 是不可能的。

我也无法使用外部服务器软件(只有数据库模块),并且根本不需要多用户支持。当然,服务器软件无论如何也无法容纳来自单个端点的每秒 500,000 个查询,因此排除了 Redis、Tokyo tyrant 等。

I am trying to create a key/value database with 300,000,000 key/value pairs of 8 bytes each (both for the key and the value). The requirement is to have a very fast key/value mechanism which can query about 500,000 entries per second.

I tried BDB, Tokyo DB, Kyoto DB, and levelDB and they all perform very bad when it comes to databases at that size. (Their performance is not even close to their benchmarked rate at 1,000,000 entries).

I cannot store my database in memory because of hardware limitations (32 bit software), so memcached is out of the question.

I cannot use external server software as well (only a database module), and there is no need for multi-user support at all. Of course server software cannot hold 500,000 queries per second from a single endpoint anyways, so that leaves out Redis, Tokyo tyrant, etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

孤独陪着我 2024-12-08 16:07:21

大卫·塞格劳,在这里。 Berkeley DB 产品经理。

BDB 性能最常见的问题是人们没有配置缓存大小,将其保留为默认值,该值相当小。第二个最常见的问题是,人们编写的应用程序行为模拟器进行随机查找(即使他们的应用程序并不是真正完全随机),这迫使他们从缓存中读取数据。然后,随机 I/O 会引导他们得出有关性能的结论,这些结论不是基于模拟应用程序,而是基于实际应用程序行为。

根据您的描述,我不确定您是否遇到了这些常见问题,或者可能完全遇到了其他问题。无论如何,我们的经验是 Berkeley DB 的性能和扩展性往往都非常好。我们很乐意帮助您识别任何瓶颈并提高 BDB 应用程序吞吐量。获得这方面帮助的最佳地点是 BDB 论坛:http://forums.oracle .com/forums/forum.jspa?forumID=271。当您在论坛上发帖时,显示应用程序代码的关键查询段以及显示数据库环境性能的 db_stat 输出会很有用。

您可能希望使用 BDB HA/复制来平衡多个服务器之间的查询负载。每秒 50 万次查询可能需要更大的多核服务器或一系列较小的复制服务器。我们经常看到 BDB 应用程序在商用硬件上每秒执行 100-200K 查询,但在 32 位应用程序中每秒对 300M 记录执行 500K 查询可能需要一些仔细的调整。我建议重点优化在单个节点上运行的 BDB 应用程序的查询性能,然后使用 HA 在多个系统之间分配该负载,以扩展查询/秒吞吐量。

我希望这有帮助。

祝您申请顺利。

问候,

戴夫

David Segleau, here. Product Manager for Berkeley DB.

The most common problem with BDB performance is that people don't configure the cache size, leaving it at the default, which is pretty small. The second most common problem is that people write application behavior emulators that do random look-ups (even though their application is not really completely random) which forces them to read data out of cache. The random I/O then takes them down a path of conclusions about performance that are not based on the simulated application rather than the actual application behavior.

From your description, I'm not sure if your running into these common problems or maybe into something else entirely. In any case, our experience is that Berkeley DB tends to perform and scale very well. We'd be happy to help you identify any bottlenecks and improve your BDB application throughput. The best place to get help in this regard would be on the BDB forums at: http://forums.oracle.com/forums/forum.jspa?forumID=271. When you post to the forum it would be useful to show the critical query segments of your application code and the db_stat output showing the performance of the database environment.

It's likely that you will want to use BDB HA/Replication in order to load balance the queries across multiple servers. 500K queries/second is probably going to require a larger multi-core server or a series of smaller replicated servers. We've frequently seen BDB applications with 100-200K queries/second on commodity hardware, but 500K queries per second on 300M records in a 32-bit application is likely going to require some careful tuning. I'd suggest focusing on optimizing the performance of a the queries on the BDB application running on a single node, and then use HA to distribute that load across multiple systems in order to scale your query/second throughput.

I hope that helps.

Good luck with your application.

Regards,

Dave

那一片橙海, 2024-12-08 16:07:21

我找到了一个很好的基准比较网页,它基本上比较了 5 个知名数据库:

  • LevelDB
  • 京都 TreeDB
  • SQLite3
  • MDB
  • BerkeleyDB

在做出选择之前您应该检查一下: http://symas.com/mdb/microbench/

PS - 我知道您已经测试过它们,但您还应该考虑到每个测试的配置都没有像基准测试所显示的那样进行优化。

I found a good benchmark comparison web page that basically compares 5 renowned databases:

  • LevelDB
  • Kyoto TreeDB
  • SQLite3
  • MDB
  • BerkeleyDB

You should check it out before making your choice: http://symas.com/mdb/microbench/.

P.S - I know you've already tested them, but you should also consider that your configuration for each of these tests was not optimized as the benchmark shows otherwise.

好菇凉咱不稀罕他 2024-12-08 16:07:21

尝试 ZooLib

它提供了一个带有 C++ API 的数据库,该数据库最初是为教育机构的高性能多媒体数据库(称为知识论坛)编写的。它可以同时处理 3,000 个 Mac 和 Windows 客户端(也是用 ZooLib 编写的 - 它是一个跨平台应用程序框架),所有这些客户端都可以传输音频、视频并处理由教师和学生创建的图形丰富的文档。

它有两个低级 API,用于实际将字节写入磁盘。一种速度非常快,但不具有容错性。另一个是容错的,但速度没那么快。

我是 ZooLib 的开发人员之一,但我对 ZooLib 的数据库组件没有太多经验。也没有文档 - 您必须阅读源代码才能弄清楚它是如何工作的。这是我自己的错,因为我十多年前就承担了编写 ZooLib 手册的工作,但才刚刚开始。

ZooLib 的主要开发人员 Andy Green 是一个很棒的人,总是很乐意回答问题。我建议您在 SourceForge 上订阅 ZooLib 的开发人员列表,然后在列表上询问如何使用该数据库。安迪很可能会亲自回答您,但也许我们的其他开发人员之一会回答您。

ZooLib 是在 MIT 许可下开源的,并且是真正高质量、成熟的代码。自 1990 年左右以来,它一直在持续开发,并于 2000 年被置于开源状态。

不要担心我们自 2003 年以来就没有发布 tarball。我们可能应该发布,因为这导致许多潜在用户认为它是已被废弃,但它被非常积极地使用和维护。只需从 Subversion 获取源代码即可。

安迪是一名个体经营顾问。如果您没有时间但有预算,他会很好地编写定制的、可维护的高质量 C++ 代码来满足您的需求。

如果它是 ZooLib 中除数据库以外的任何部分,我也会这样做,正如我所说,我对此不熟悉。我自己用 ZooLib 的 UI 框架做了很多咨询工作。

Try ZooLib.

It provides a database with a C++ API, that was originally written for a high-performance multimedia database for educational institutions called Knowledge Forum. It could handle 3,000 simultaneous Mac and Windows clients (also written in ZooLib - it's a cross-platform application framework), all of them streaming audio, video and working with graphically rich documents created by the teachers and students.

It has two low-level APIs for actually writing your bytes to disk. One is very fast but is not fault-tolerant. The other is fault-tolerant but not as fast.

I'm one of ZooLib's developers, but I don't have much experience with ZooLib's database component. There is also no documentation - you'd have to read the source to figure out how it works. That's my own damn fault, as I took on the job of writing ZooLib's manual over ten years ago, but barely started it.

ZooLib's primarily developer Andy Green is a great guy and always happy to answer questions. What I suggest you do is subscribe to ZooLib's developer list at SourceForge then ask on the list how to use the database. Most likely Andy will answer you himself but maybe one of our other developers will.

ZooLib is Open Source under the MIT License, and is really high-quality, mature code. It has been under continuous development since 1990 or so, and was placed in Open Source in 2000.

Don't be concerned that we haven't released a tarball since 2003. We probably should, as this leads lots of potential users to think it's been abandoned, but it is very actively used and maintained. Just get the source from Subversion.

Andy is a self-employed consultant. If you don't have time but you do have a budget, he would do a very good job of writing custom, maintainable top-quality C++ code to suit your needs.

I would too, if it were any part of ZooLib other than the database, which as I said I am unfamiliar with. I've done a lot of my own consulting work with ZooLib's UI framework.

简美 2024-12-08 16:07:21

300M * 8 字节 = 2.4GB。这可能适合内存(如果操作系统不将地址空间限制为 31 位)
由于您还需要处理溢出,(通过重新哈希方案或通过链接)内存变得更加紧张,对于线性探测,您可能需要 > 400M 插槽,链接会将项目的大小增加到 12 字节(位调整可能会增加一些位)。这将使总占用空间增加到大约 3.6 GB。

无论如何,您都需要一个特制的内核,将其自己的“保留”地址空间限制为几百 MB。并非不可能,但却是一项重大手术。在所有情况下,转义到基于磁盘的东西都太慢了。 (PAE 可以拯救您,但这很棘手)

恕我直言,您最好的选择是迁移到 64 位平台。

300 M * 8 bytes = 2.4GB. That will probably fit into memory (if the OS does not restrict the address space to 31 bits)
Since you'll also need to handle overflow, (either by a rehashing scheme or by chaining) memory gets even tighter, for linear probing you probably need > 400M slots, chaining will increase the sizeof item to 12 bytes (bit fiddling might gain you a few bits). That would increase the total footprint to circa 3.6 GB.

In any case you will need a specially crafted kernel that restricts it's own "reserved" address space to a few hundred MB. Not impossible, but a major operation. Escaping to a disk-based thing would be too slow, in all cases. (PAE could save you, but it is tricky)

IMHO your best choice would be to migrate to a 64 bits platform.

仙女山的月亮 2024-12-08 16:07:21

每秒 500,000 个条目而不将工作集保存在内存中?哇。

在一般情况下,使用 HDD 甚至困难的 SSD 都是不可能的。

您是否有任何可能有助于使任务变得更容易的局部属性?您有什么疑问?

500,000 entries per second without holding the working set in memory? Wow.

In the general case this is not possible using HDDs and even difficult SSDs.

Have you any locality properties that might help to make the task a bit easier? What kind of queries do you have?

不忘初心 2024-12-08 16:07:21

我们使用Redis。它是用 C 语言编写的,其设计仅比 memcached 稍微复杂一些。从未尝试过使用那么多行,但对我们来说延迟非常重要,它可以很好地处理这些延迟并让我们将数据存储在磁盘中

这是一个 基准测试博客条目,比较 redis 和 memcached。

We use Redis. Written in C, its only slightly more complicated than memcached by design. Never tried to use that many rows but for us latency is very important and it handles those latencies well and lets us store the data in the disk

Here is a bench mark blog entry, comparing redis and memcached.

蹲墙角沉默 2024-12-08 16:07:21

Berkely DB 可以为您做到这一点。
大约 8 年前,我实现了每秒 50000 次插入,最终数据库包含 700 亿条记录。

Berkely DB could do it for you.
I acheived 50000 inserts per second about 8 years ago and a final database of 70 billion records.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文