通过网络高效复制文件的算法

发布于 2024-12-19 10:59:47 字数 926 浏览 7 评论 0原文

我知道有几个程序可以通过网络同步文件。他们没有一个做我一直在想的事情。让我解释一下我想要实现的目标...

在我的网络中，多台计算机共享相同的文件。例如，quickbooks 文件被多台计算机访问，并且它是一个大文件。还有 Outlook Large 中的 PST 文件。每天晚上我们都会通过网络创建已更改文件的备份。我认为如果有一些小的修改，复制整个 1 GB 文件是没有意义的。所以我想提出一种算法来比较文件的各个部分。

例如，假设 Outlook pst 文件由字节组成：

1, 2, 3, 4, 5, 6, 7, 8, 9

如果我收到一封电子邮件，字节现在将是：

1, 2, 3, 4, 5, 6, 7, 8, 9, 10    for example

现在不用发送整个文件，只发送字节 10 会更容易，

所以实际上该文件有数千个字节所以我将对文件的每个兆字节进行校验和，所以现在我的表应该如下所示：

aaa1, aaa2, aaa3, abf8, etc...

如果现在收到电子邮件时，pst 文件的表如下：

aaa1, aaa2, aaa3,  7a8b, etc  ... then I know that the first 3 megabits are the same and I should send just one megabite instead of the entire file...

我认为如果内容是在邮件末尾添加的，则此算法将非常有效文件，但实际上一个字节可能会被改变在文件的开头，我的算法将不起作用。例如，如果在文件开头添加一个字节，所有十六进制代码都会改变......

我怎样才能使算法更有效？如果我可以发送文件的一部分而不是整个文件，那就太好了

原文

I know there are several programs out there that will sync files over the network. Non of them do what I have been thinking of. Let me explain what I want to achieve...

In my network several computers share the same files. for example the quickbooks file is accessed by several computers and it is a large file. also there are the pst files from outlook large as well. every night we create a backup over the network of the files that have been changed. I think it does not make sanse to copy a whole 1 gb file if it had some minor modification. so I want to come up with an algorithm that will compare parts of files.

for example let's say that the outlook pst file consists of bytes:

1, 2, 3, 4, 5, 6, 7, 8, 9

if I receive an email the bytes will now be:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10    for example

now instead of sending the whole file it will be easier to send just the byte 10

so in reality the file has thousands of bytes so I will do the checksum of every megabyte of the file so now my table should look like:

aaa1, aaa2, aaa3, abf8, etc...

if when receiving an email now the pst file has a table as:

aaa1, aaa2, aaa3,  7a8b, etc  ... then I know that the first 3 megabits are the same and I should send just one megabite instead of the entire file...

I think this algorithm will work great if content was added towards the end of the file but in reality a byte may be changed at the beginning of the file and my algorithm is not going to work. for example if one byte is added at the begining of the file all the hex codes will change...

how can I make the algorithm more efficient? It will be nice if I could send parts of the file instead of the whole file

分享到QQ

分享到微博