部署高性能 Berkeley DB 系统的最佳实践
我希望使用 Berkeley DB 创建一个简单的键值存储系统。密钥将是 SHA-1 哈希值,因此它们位于 160 位地址空间中。我有一个简单的服务器正在运行,这很容易,这要归功于 Berkeley DB 网站上写得相当好的文档。然而,我对如何最好地建立这样一个系统以获得良好的性能和灵活性有一些疑问。希望有人对 Berkeley DB 有更多经验并可以帮助我。
最简单的设置是单个进程、单个线程、处理单个数据库;使用事务在该数据库上执行插入和获取。
方案一:单进程、多线程、单DB;进程中的所有线程都在此数据库上执行插入和获取操作。
- 使用多线程是否能带来很大的性能提升?有一个数据库,因此它位于一个磁盘上,因此我猜我不会得到太多的提升。但是,如果 Berkeley DB 在内存中缓存大量内容,那么也许一个线程能够运行并从缓存中应答,而另一个线程则阻塞等待磁盘?我正在使用 GNU Pth,用户级协作线程。我不熟悉 Pth 的细节,所以我也不确定使用 Pth 是否可以在另一个用户级线程被阻塞时运行一个用户级线程。
替代方案 2:单进程、一个或多个线程、多个 DB,其中每个 DB 覆盖键的 160 位地址空间的一部分。
- 我看到了拥有多个数据库的一些优点:我们可以将它们放在不同的磁盘上,减少争用,如果我们愿意的话,更容易将数据库移动/分区到不同的物理主机上。有人有此设置的经验并看到显着的好处吗?
替代方案 3:多个进程,每个进程有一个线程,每个进程处理一个覆盖键的 160 位地址空间的一小部分的数据库。
- 这具有使用多个DB的优点,但是我们使用的是多个进程。这比第二种选择更好吗?我怀疑使用进程而不是用户级线程来获得并行性会让你获得更好的 SMP 缓存行为(更少的无效等),但是我会因为所有进程开销和上下文切换而被杀死吗?
我很想听听是否有人尝试过这些选择,并看到了积极或消极的结果。
谢谢。
I am looking to use Berkeley DB to create a simple key-value storage system. The keys will be SHA-1 hashes, so they are in 160-bit address space. I have a simple server working, that was easy enough thanks to the fairly well written documentation from Berkeley DB website. However, I have some questions about how best to set up such a system, to get good performance and flexibility. Hopefully, someone has had more experience with Berkeley DB and can help me.
The simplest setup is a single process, with a single thread, handling a single DB; inserts and gets are performed on this one DB, using transactions.
Alternative 1: single process, multiple threads, single DB; inserts and gets are performed on this DB, by all the threads in the process.
- Does using multiple threads provide much performance improvements? There is one single DB, and therefore it's on one disk, and therefore I am guessing I won't get too much boost. But if Berkeley DB caches a lot of stuff in memory, then perhaps one thread will be able to run and answer from cache while another has blocked waiting for disk? I am using GNU Pth, user level cooperative threading. I am not familiar with the details of Pth, so I am also not sure if with Pth you can have a userlevel thread run while another userlevel thread has blocked.
Alternative 2: single process, one or multiple threads, multiple DBs where each DB covers a fraction of the 160-bit address space for keys.
- I see a few advantages in having multiple DBs: we can put them on different disks, less contention, easier to move/partition DBs onto different physical hosts if we want to do that. Does anyone have experience with this setup and see significant benefits?
Alternative 3: multiple processes, each with one thread, each handles a DB that covers a fraction of the 160-bit address space for keys.
- This has the advantages of using multiple DBs, but we are using multiple processes. Is this better than the second alternative? I suspect using processes rather than user-level threads to get parallelism will get you better SMP caching behaviors (less invalidates, etc), but will I get killed with all the process overheads and context switches?
I would love to hear if someone has tried the options, and have seen positive or negative results.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
替代方案 2 为您提供了高可扩展性。您基本上将数据库分区
多个服务器。如果您需要高性能分布式键/值数据库,我会
建议查看 membase。我现在正在这样做,但我们需要在设备上运行
并希望限制(membase)的依赖关系。
您可以使用 BerkeleyDB 复制并通过服务器拥有只读副本来提供读取/获取服务
请求。
Alternative 2 gives you high scalability. You basically partition your database across
multiple servers. If you need a high performance distributed key/value database, I would
suggest looking at membase. I am doing that right now but we need to run on an appliance
and would like to limit dependencies (of membase).
You can use BerkeleyDB replication and have read only copies with servers to serve read/get
requests.