与 RDBMS 相比,使用 CouchDB 时需要多少存储空间?

发布于 2024-09-14 06:21:41 字数 454 浏览 13 评论 0原文

我需要知道使用 CouchDB 实现解决方案时需要考虑的因素。我知道 CouchDB 不需要规范化,并且我在 RDBMS 开发中使用的标准技术大部分都被丢弃了。

但具体涉及的成本是多少。我完全理解其中的好处,但存储成本让我有点紧张,因为 CouchDB 似乎需要大量的复制数据,其中一些数据在使用之前就已经过时了。如何管理陈旧数据?

我知道我可以使用 Couchdb 实现一些糟糕的文档关系模型并降低存储成本,但这不会违背 Couchdb 的目标和我可以获得的性能吗?

我正在考虑的一个例子是请求、订购和招标系统。该系统当前正在发生一对多的事情,并且许多事情可能比一个事情更新得更频繁。

任何帮助都会很棒,因为我是一个老派 RDBMS 人员,接受了 CJ Date、EF Codd 和 RF Boyce 的所有教导,目前正在努力解决文档存储的激进概念。

Couchdb 有内部的东西来管理重复数据的识别和减少吗?

I need to know the factoring that needs to be taken into consideration when implementing a solution using CouchDB. I understand that CouchDB does not require normalization and that the standard techniques that I use in RDBMS development are mostly thrown away.

But what exactly are the costs involved. I perfectly understand the benefits, but the costs of storage make me a bit nervous as it appears as CouchDB would need an awful lot of replicated data, some of it going stale and out of date well before its usage. How would one manage stale data?

I know that I could implement some awful relationship model with documents using Couchdb and lower the costs of storage, but wouldn't this defeat the objectives of Couchdb and the performances that I can gain?

An example I am thinking about is a system for requistions, ordering and tendering. The system currently has the one to many thing going on and the many might get updated more frequently than the one.

Any help would be great as I am an old school RDBMS guy with all the teachings of C.J. Date, E.F Codd and R. F. Boyce, so struggling at the moment with the radical notion of document storage.

Does Couchdb have anything internal to manage the recognition and reduction of duplicate data?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夕嗳→ 2024-09-21 06:21:41

只有您知道您将使用多少数据副本,因此不幸的是,唯一好的答案是构建模拟数据集并测量磁盘使用情况。

此外,与文件系统类似,CouchDB 需要额外的元数据存储空间。此成本取决于两个因素:

  1. 更新或创建文档的频率
  2. 压缩的频率

最坏情况瞬时磁盘使用量将是数据总量乘以 2,加上压缩时 (#2) 时存在的所有旧文档修订版 (#1)。这是因为压缩会构建一个仅包含当前文档修订的新数据库文件。因此,使用将是当前数据的两个副本(来自旧文件加上新文件),加上压缩完成时等待删除的所有“浪费”旧修订。压缩后,旧文件将被删除,因此您将回收最坏情况值的一半以上。

始终运行压缩对于减少数据使用来说没有问题,但它会对磁盘 I/O 产生影响。

Only you know how many copies of how much data you will use, so unfortunately the only good answer will be to build simulated data sets and measure the disk usage.

In addition, similar to a file system, CouchDB requires additional storage for metadata. This cost depends on two factors:

  1. How often you update or create a document
  2. How often you compact

The worst-case instantaneous disk usage will be the total amount of data times two, plus all the old document revisions (#1) existing at compaction time (#2). This is because compaction builds a new database file with only the current document revisions. Therefore the usage will be two copies of current data (from the old file plus the new file), plus all of the "wasted" old revisions awatiing deletion when compaction completes. After compaction, the old file is deleted so you will reclaim over half of this worst-case value.

Running compaction all the time is no problem to reduce data use however it has implications with disk i/o.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文