GDBM 的替代方案或继承者

发布于 2024-07-15 22:22:03 字数 904 浏览 7 评论 0原文

我们有一个 GDBM 键值数据库作为后端用 C++ 实现的负载平衡的面向 Web 的应用程序。 应用程序提供的数据已经变得非常大,因此我们的管理员已将 GDBM 文件从“本地”存储(在网络服务器上或非常接近)移动到大型、共享、远程、NFS 安装的文件系统。

这影响了性能。 我们的性能测试(在测试环境中)显示页面加载时间从数百毫秒(对于本地磁盘)跃升至几秒(通过 NFS、本地网络),有时甚至高达 30 秒。 我认为问题的很大一部分是应用程序从 GDBM 文件中进行大量随机读取,并且这些读取速度比 NFS 慢,并且这在生产中会更糟(前端和后端甚至有它们之间有更多的网络硬件)并且我们的数据库变得更大。

虽然这不是一个关键的应用程序,但我希望提高性能,并拥有一些可用资源,包括应用程序开发人员的时间和 Unix 管理员。 我的主要限制是时间只有几周的资源。

在我看来,我的选择是:

  1. 通过调整参数来提高 NFS 性能。 我的直觉是我们不会从中得到太多好处,但我以前就错了,而且我对 NFS 调优也不是很了解。

  2. 移动到不同的键值数据库,例如 memcachedbTokyo Cabinet

  3. 将 NFS 替换为其他协议(有人提到过 iSCSI,但我不熟悉它)。

我应该如何解决这个问题?

We a have a GDBM key-value database as the backend to a load-balanced web-facing application that is in implemented in C++. The data served by the application has grown very large, so our admins have moved the GDBM files from "local" storage (on the webservers, or very close by) to a large, shared, remote, NFS-mounted filesystem.

This has affected performance. Our performance tests (in a test environment) show page load times jumping from hundreds of milliseconds (for local disk) to several seconds (over NFS, local network), and sometimes getting as high as 30 seconds. I believe a large part of the problem is that the application makes lots of random reads from the GDBM files, and that these are slow over NFS, and this will be even worse in production (where the front-end and back-end have even more network hardware between them) and as our database gets even bigger.

While this is not a critical application, I would like to improve performance, and have some resources available, including the application developer time and Unix admins. My main constraint is time only have the resources for a few weeks.

As I see it, my options are:

  1. Improve NFS performance by tuning parameters. My instinct is we wont get much out of this, but I have been wrong before, and I don't really know very much about NFS tuning.

  2. Move to a different key-value database, such as memcachedb or Tokyo Cabinet.

  3. Replace NFS with some other protocol (iSCSI has been mentioned, but i am not familiar with it).

How should I approach this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

影子是时光的心 2024-07-22 22:22:03

不要太执着于“关系与非关系”的比较。 似乎与这个问题无关。

您的应用程序跨越的界限是不同的:从本地快速文件存储上的小型数据库到通过网络访问的大型数据库。 跨越这条线意味着您现在可以得到更好的专用网络服务数据库管理系统的服务。 管理服务器是否管理关系数据库与该方面无关。

为了快速启动并运行,MariaDB(MySQL 的后继者)可能是您最好的选择。 如果您预见它的增长远远超出现在的水平,您不妨将其放入 PostgreSQL 中,因为那就是无论如何,它最终需要去:-)

Don't get too hung up on the “relational versus non-relational” comparison. It appears to be irrelevant for this issue.

The line your application has crossed is a different one: from a small database on local fast file storage, to a large database accessed over the network. Crossing that line means you are now better served by a dedicated, network serviced, database management system. Whether the management server manages relational databases isn't relevant for that aspect.

For getting it up and running quickly, MariaDB (the successor to MySQL) is probably your best bet. If you foresee it growing much beyond where it is now, you might as well put it in PostgreSQL since that's where it will need to go eventually anyway :-)

吾家有女初长成 2024-07-22 22:22:03

这似乎不是您想听到的,但说实话,如果我是您,我会把它扔到 mysql 表中。 这并不是说它的使用变得更加困难,而且您可以从中获得很多好处,尤其是与 GDBM-over-NFS 不同,它是一个真正适合您情况的远程访问协议。

This appears to not be what you want to hear, but honestly, if I were you I'd throw it in a mysql table. It's not as if it's meaningfully harder to work with, and you get a lot of benefits with it, not least a remote access protocol that's actually intended for your situation, unlike GDBM-over-NFS.

微凉 2024-07-22 22:22:03

如果您想坚持使用非关系数据库,您可以尝试 BDB 或 DJB 的 CDB。 到目前为止,我已经使用了这两种方法,并且我认为当谈到性能时,它们的表现优于 GDBM。

但请记住 bignose 的答案,因为我也认为您的瓶颈可能不是您正在使用的数据结构(GDBM),而是您的基础设施。

If you want to stick to non-relational databases you could try BDB or DJB's CDB. I have used both so far and i think when it comes down to performance they outperform GDBM.

But keep bignose's answer in mind as i, too, think that your bottleneck might not be the data-structure (GDBM) you are using but your infrastructure.

爱人如己 2024-07-22 22:22:03

通过网络使用平面文件进行文件系统 I/O 并不是一个好主意,但是您应该考虑编写一个多线程 tcp 服务器来进行 I/O、查询等。 在那台机器上,然后将结果返回给您。 传输小块数据而不是整个数据库文件。

我正在设计一个缓存持久性机制来克服高可用性问题。我将用 python 对其进行编码。

File system i/o with flat files over a network is not a good idea, but you should consider writing a multi-threaded tcp server that makes i/o,query,etc. on that machine, then returns you the results back. Transfer small chunks of data not whole db files..

I'm designing a cache-persistence mechanism to overcome a high-availability problem.I will code it, in python.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文