CouchDB 备份和克隆数据库
我们正在寻找 CouchdDB 作为类似 CMS 的应用程序。 围绕备份我们的生产数据库有哪些常见模式、最佳实践和工作流程建议?
我对克隆数据库以用于开发和测试的过程特别感兴趣。
仅从实时运行的实例下复制磁盘上的文件就足够了吗? 您可以在两个实时运行的实例之间克隆数据库数据吗?
对于您所使用的技术的建议和描述将不胜感激。
We're looking at CouchdDB for a CMS-ish application. What are some common patterns, best practices and workflow advice surrounding backing up our production database?
I'm particularly interested in the process of cloning the database for use in development and testing.
Is it sufficient to just copy the files on disk out from under a live running instance? Can you clone database data between two live running instances?
Advice and description of the techniques you use will be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
CouchDB 支持复制,因此只需复制到 CouchDB 的另一个实例并从那里备份,避免干扰您写入更改的位置。
https://docs.couchdb.org/en/latest/maintenance/backups。 您实际上
向您的 CouchDB 实例发送一个 POST 请求,告诉它复制到哪里,它可以工作(tm)
编辑:您可以从正在运行的数据库下 cp 出数据目录中的 .couch 文件,只要因为您可以接受 I/O 命中。
CouchDB supports replication, so just replicate to another instance of CouchDB and backup from there, avoiding disturbing where you write changes to.
https://docs.couchdb.org/en/latest/maintenance/backups.html
You literally send a POST request to your CouchDB instance telling it where to replicate to, and it Works(tm)
EDIT: You can just cp out the .couch files in the data directory from under the running database as long as you can accept the I/O hit.
另一件需要注意的事情是,您可以从实时数据库下复制文件。 鉴于您可能有一个可能很大的数据库,您可以将其 OOB 从测试/生产计算机复制到另一台计算机。
根据计算机的写入负载,建议在复制后触发复制,以收集复制文件时正在进行的任何写入操作。 但是复制一些记录仍然比复制整个数据库更快。
有关参考,请参阅:http://wiki.apache.org/couchdb/FilesystemBackups
Another thing to be aware of is that you can copy files out from under a live database. Given that you may have a possibly large database, you could just copy it OOB from your test/production machine to another machine.
Depending on the write load of the machines it may be advisable to trigger a replication after the copy to gather any writes that were in progress when the file was copied. But replication of a few records would still be quicker than replication the entire database.
For reference see: http://wiki.apache.org/couchdb/FilesystemBackups
我通过 powershell 和 PSCouchDB 模块使用命令 导出-CouchDBDatabase。
这会将整个数据库导出到 json 文件,您可以通过 import 命令重新导入该文件(请参阅链接)。
前任。
这会在当前目录中导出一个 json 文件:
test_05-28-2021_17_01_00.json
I do it via powershell and the PSCouchDB module with the command Export-CouchDBDatabase.
This exports an entire database to a json file, which you can re-import via the import command (see the link).
ex.
this export a json file in a current directory:
test_05-28-2021_17_01_00.json
CouchDB 复制非常糟糕。 我通常使用 tar ,这要好得多。
.
开头的子目录。scp
将 tar.gz 文件发送到目标主机并将其解压到临时位置。chown
将文件发送给拥有目标数据库目录中已有文件的用户和组。 这可能是 couchdb:couchdb。 这很重要,因为到目前为止,弄乱文件权限是我设法弄乱此过程的唯一方法。cp
将文件复制到目标目录中。 再次在我的主机上,它是/var/lib/couchdb。CouchDB replication is horrible. I generally do tar which is much better.
.
when you archive the files.scp
the tar.gz file to the destination host and unpack them in a temporary location there.chown
the files to the user and group that owns the files already in the database directory on the destination. This is likely couchdb:couchdb. This is important, as messing up the file permissions is the only way I’ve managed to mess up this process so far.cp
the files into the destination directory. Again on my hosts this has been /var/lib/couchdb.CouchDB 还可以很好地与现代文件系统(例如 ZFS)提供的文件系统快照配合使用。 由于数据库文件始终处于一致状态,因此您可以随时拍摄文件快照,而不会削弱 CouchDB 提供的完整性保证。
这导致几乎没有 I/O 开销。 例如,如果您意外地从数据库中删除了文档,您可以将快照移动到另一台计算机并在那里提取丢失的数据。 您甚至可以复制回生产数据库,但我从未尝试过。
但在移动数据库文件时,请务必确保使用完全相同的 couchdb 修订版。 磁盘格式仍在以不兼容的方式发展。
CouchDB also works very nicely with filesystem snapshots offered by modern filesystems like ZFS. Since the database file always is in a consistent state you can take the snapshot of the file any time without weakening the integrity guarantees provided by CouchDB.
This results in nearly no I/O overhead. In case you have e.g. accidentally deleted a document from the database you can move the snapshot to another machine and extract the missing data there. You might even be able to replicate back to the production database, but I never have tried that.
But always make sure you use exactly the same couchdb revisions when moving around database files. The on-disk format is still evolving in incompatible ways.
我想赞同 Paul 的建议:如果您可以承受 I/O 负载,只需从实时服务器下
cp
您的数据库文件即可。 如果您无论如何运行复制副本,您也可以安全地从中进行复制,而不会影响主服务器的性能。I'd like to second Paul's suggestion: Just
cp
your database files from under the live server if you can take the I/O-load hit. If you run a replicated copy anyway, you can safely copy from that too, without impacting your master's performance.