我想知道为什么我的 CouchDB 数据库增长得这么快,所以我写了一些测试脚本。此脚本更改 CouchDB 文档的属性 1200 次,并在每次更改后获取数据库的大小。执行这 1200 个写入步骤后,数据库正在执行 压缩步骤并再次测量数据库大小。最后,脚本根据修订号绘制数据库大小。基准测试运行两次:
第一次运行生成以下图
第二次运行产生此图
对我来说,这是一个非常意外的行为。在第一次运行中,我预计会出现线性增长,因为每次更改都会产生新的修订版。当达到 1000 个修订版时,大小值应该保持不变,因为旧的修订版将被丢弃。压实后,尺寸应显着下降。
在第二次运行中,第一个修订应导致一定的数据库大小,然后在以下写入步骤中保留该大小,因为每个新修订都会导致前一个修订的删除。
我可以理解是否需要一点开销来管理这些变化,但这种增长行为对我来说似乎很奇怪。任何人都可以解释这种现象或纠正我导致错误期望的假设吗?
I was wondering why my CouchDB database was growing to fast so I wrote a little test script. This script changes an attributed of a CouchDB document 1200 times and takes the size of the database after each change. After performing these 1200 writing steps the database is doing a compaction step and the db size is measured again. In the end the script plots the databases size against the revision numbers. The benchmarking is run twice:
- The first time the default number of document revision (=1000) is used (_revs_limit).
- The second time the number of document revisions is set to 1.
The first run produces the following plot
The second run produces this plot
For me this is quite an unexpected behavior. In the first run I would have expected a linear growth as every change produces a new revision. When the 1000 revisions are reached the size value should be constant as the older revisions are discarded. After the compaction the size should fall significantly.
In the second run the first revision should result in certain database size that is then keeps during the following writing steps as every new revision leads to the deletion of the previous one.
I could understand if there is a little bit of overhead needed to manage the changes but this growth behavior seems weird to me. Can anybody explain this phenomenon or correct my assumptions that lead to the wrong expectations?
发布评论
评论(1)
首先,CouchDB 甚至会保存已删除修订的一些信息(仅 ID 和修订标识符),因为它需要这些信息来进行复制。
其次,由于数据保存在磁盘上的方式,一次插入一个文档并不是最佳选择(请参阅 维基百科),这可以解释第一张图中的超线性增长。
First off, CouchDB saves some information even for deleted revisions (just the ID and revision identifier), because it needs this for replication purposes.
Second, inserting documents one at a time is suboptimal because of the way the data is saved on disk (see WikiPedia), this could explain the superlinear growth in the first graph.