RSync 每次都会更改的单个(存档)文件
我正在开发一个开源备份实用程序,它可以备份文件并通过 FTP/SFTP/SCP 协议将它们传输到各种外部位置,例如 Amazon S3、Rackspace Cloud Files、Dropbox 和远程服务器。
现在,我收到了进行增量备份的功能请求(以防所做的备份很大并且传输和存储成本昂贵)。我环顾四周,有人提到了 rsync 实用程序。我对此进行了一些测试,但不确定这是否合适,因此希望听取任何对 rsync 有一定经验的人的意见。
让我快速概述一下备份时会发生什么。基本上它会开始转储 MySQL、PostgreSQL、MongoDB、Redis 等数据库。它可能会从文件系统中获取一些常规文件(例如图像)。一旦一切就绪,它会将所有内容捆绑在一个 .tar 中(此外,它还会使用 gzip
和 openssl
对其进行压缩和加密)。
全部完成后,我们就有了一个如下所示的文件:mybackup.tar.gz.enc
现在我想将此文件传输到远程位置。目标是降低带宽和存储成本。因此,我们假设这个小备份包的大小约为 1GB
。因此,我们使用 rsync 将其传输到远程位置并在本地删除文件备份。明天将生成一个新的备份文件,结果发现过去 24 小时内添加了很多数据,我们构建一个新的 mybackup.tar.gz.enc
文件,它看起来就像我们的大小高达 1.2GB
一样。
现在,我的问题是:是否可以仅传输过去 24 小时内添加的 200MB
?我尝试了以下命令:
rsync -vhP --append mybackup.tar.gz.enc backups/mybackup.tar.gz.enc
结果:
mybackup.tar.gz.enc 1.20G 100% 36.69MB/s 0:00:46(xfer#1,待检查=0/1)
已发送 200.01M 字节
已接收 849.40K 字节
8.14M 字节/秒
总大小1.20G
加速比为 2.01
查看发送的 200.01M 字节
我想说数据的“附加”工作正常。我现在想知道的是,它是否传输了整个1.2GB
,以便确定要附加到现有备份的数量和内容,或者它真的只传输200MB
?因为如果它传输整个 1.2GB
那么我看不出它与在单个大文件上使用 scp
实用程序有什么不同。
另外,如果我想要完成的事情是可能的,你推荐什么标志?如果 rsync 无法实现,您是否可以推荐使用任何实用程序?
非常感谢任何反馈!
I am working on an open source backup utility that backs up files and transfers them to various external locations such as Amazon S3, Rackspace Cloud Files, Dropbox, and remote servers through FTP/SFTP/SCP protocols.
Now, I have received a feature request for doing incremental backups (in case the backups that are made are large and become expensive to transfer and store). I have been looking around and someone mentioned the rsync
utility. I performed some tests with this but am unsure whether this is suitable, so would like to hear from anyone that has some experience with rsync
.
Let me give you a quick rundown of what happens when a backup is made. Basically it'll start dumping databases such as MySQL, PostgreSQL, MongoDB, Redis. It might take a few regular files (like images) from the file system. Once everything is in place, it'll bundle it all in a single .tar (additionally it'll compress and encrypt it using gzip
and openssl
).
Once that's all done, we have a single file that looks like this:mybackup.tar.gz.enc
Now I want to transfer this file to a remote location. The goal is to reduce the bandwidth and storage cost. So let's assume this little backup package is about 1GB
in size. So we use rsync
to transfer this to a remote location and remove the file backup locally. Tomorrow a new backup file will be generated, and it turns out that a lot more data has been added in the past 24 hours, and we build a new mybackup.tar.gz.enc
file and it looks like we're up to 1.2GB
in size.
Now, my question is: Is it possible to transfer just the 200MB
that got added in the past 24 hours? I tried the following command:
rsync -vhP --append mybackup.tar.gz.enc backups/mybackup.tar.gz.enc
The result:
mybackup.tar.gz.enc 1.20G 100% 36.69MB/s 0:00:46 (xfer#1, to-check=0/1)
sent 200.01M bytes
received 849.40K bytes
8.14M bytes/sec
total size is 1.20G
speedup is 2.01
Looking at the sent 200.01M bytes
I'd say the "appending" of the data worked properly. What I'm wondering now is whether it transferred the whole 1.2GB
in order to figure out how much and what to append to the existing backup, or did it really only transfer the 200MB
? Because if it transferred the whole 1.2GB
then I don't see how it's much different from using the scp
utility on single large files.
Also, if what I'm trying to accomplish is at all possible, what flags do you recommend? If it's not possible with rsync
, is there any utility you can recommend to use instead?
Any feedback is much appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
gzip 的性质是,源文件中的微小更改可能会导致生成的压缩文件发生很大的更改 - gzip 每次都会自行决定压缩您提供的数据的最佳方法。
某些版本的 gzip 具有
--rsyncable
开关,该开关将 gzip 工作的块大小设置为与 rsync 相同,这会导致压缩效率稍低(在大多数情况下),但将更改限制为将输出文件复制到与源文件中的更改相同的输出文件区域。如果您不可用,那么通常最好 rsync 未压缩的文件(如果考虑带宽,则使用 rsync 自己的压缩)并在最后进行压缩(如果考虑磁盘空间)。显然,这取决于您的用例的具体情况。
The nature of gzip is such that small changes in the source file can result in very large changes to the resultant compressed file - gzip will make its own decisions each time about the best way to compress the data that you give it.
Some versions of gzip have the
--rsyncable
switch which sets the block size that gzip works at to the same as rsync's, which results in a slightly less efficient compression (in most cases) but limits the changes to the output file to the same area of the output file as the changes in the source file.If that's not available to you, then it's typically best to rsync the uncompressed file (using rsync's own compression if bandwidth is a consideration) and compress at the end (if disk space is a consideration). Obviously this depends on the specifics of your use case.
它仅发送其声称发送的内容 - 仅传输更改的部分是 rsync 的主要功能之一。它使用一些相当聪明的校验和算法(并且它通过网络发送这些校验和,但是这个可以忽略不计 - 比传输文件本身少几个数量级;在您的情况下,我假设这是
200.01M
中的.01
),并且仅传输这些部分它需要。另请注意,已经有基于 rsync 的非常强大的备份工具 - 即 Duplicity。根据您的代码的许可证,可能值得看看他们是如何做到这一点的。
It sent only what it says it sent - only transferring the changed parts is one of the major features of
rsync
. It uses some rather clever checksumming algorithms (and it sends those checksums over the network, but this is negligible - several orders of magnitude less data than transferring the file itself; in your case, I'd assume that's the.01
in200.01M
) and only transfers those parts it needs.Note also that there already are quite powerful backup tools based on rsync - namely, Duplicity. Depending on the license of your code, it may be worthwhile to see how they do this.
如果现有数据有任何更改,新的 rsync --append 将破坏您的文件内容。 (自3.0.0)
New rsync --append WILL BREAK your file contents, if there are any changes in your existing data. (Since 3.0.0)