当前位置：文江博客话题详情

rsync 可以支持一对多同步吗？

发布于 2024-10-06 15:35:12 字数 163 浏览 7 评论 0原文

我可以使用 rsync 在同一服务器上的数百个网站之间同步我所处理的“模型”网站的更改吗？
我会更新通用模板文件和 JS 脚本。如果可能的话我该如何设置？
（我在 Hostgator 专用服务器上，运行 Apache）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝眸 2024-10-13 15:35:12

阅读我对下面已编辑问题的扩展答案。

最简单、最天真的方法可能是设置一个脚本，只为您想要同步的每个服务器运行 rsync。

在大多数情况下这很好，但我不认为这就是您正在寻找的，因为您自己会发现这一点...

这种方法也有以下缺点：

一个服务器发送所有流量，没有级联。所以它是单点故障和瓶颈
效率非常低。 rsync 是一个很棒的工具，但是解析文件列表并检查差异并不是很快，如果要同步数百台服务器

但是你能做什么呢？

为多个服务器配置 rsync 显然是最简单的方法。所以你应该从这个开始，优化你的问题所在。

例如，您可以通过使用正确的文件系统来加快速度。 XFS 可能比 Ext3 快 50 倍。

您还可以使用 unison，这是一个更强大的工具，可以在缓存中保存文件列表。

您还可以设置级联（服务器A同步到服务器B同步到服务器C）。

您还可以设置由客户拉动而不是推动。您可以有一个子域作为负载均衡器的入口点，其中您有 1 个或多个服务器，您可以通过从源服务器推送来进行同步。

我之所以告诉你这一切，是因为没有完美的方法，你必须根据你的需要找到它。

不过我绝对建议您研究一下 GIT。

Git 是一个非常强大且高效的版本控制系统。

您可以创建一个 git 存储库并将其推送到您的客户端计算机。

它工作得非常好、高效，并且灵活且可扩展，因此您可以在此结构上构建几乎任何东西，包括分布式文件系统、级联、负载平衡等。

希望我给您提供了一些您可以研究的正确方向的观点。

编辑：

所以看起来你想同步同一台服务器上的更改 - 甚至同一硬盘（我不知道，但这对于你拥有的可能性非常重要）。

嗯，基本上都是一样的。插入 - 覆盖 - 删除...
Rsync 也是一个非常出色的工具，因为它增量传输更改。不仅仅是“恢复中断的传输”。

但我想说这完全取决于内容。

如果你有很多小文件，比如你说的template、javascript等，rsync可能会很慢。完全删除源文件夹并再次将文件复制到那里可能会更快。因此，rsync（或任何其他工具）不必检查所有文件是否有更改等。

您也可以使用 -rf 开关复制所有内容，这样所有内容都会被覆盖，但随后您可能会删除旧文件。

我也知道很多这样的事情是使用颠覆来完成的，因为人们觉得有更多的控制权或者我不知道的东西。它也更加灵活。

然而，您应该考虑一件事：

共享数据的概念。

有符号链接和硬链接。

您可以将它们放在文件和文件夹上（仅在文件上有硬链接。我不知道为什么）。

如果将符号链接 A 放在目标 B 上，该文件看起来就像符号链接一样被定位和命名，但背后的资源在完全不同的地方。
但应用程序可以区分。例如，Apache 必须配置为遵循符号链接（否则这将是一个安全问题）。

因此，如果您的更改全部在一个文件夹中，您只需放置一个名为该文件夹的符号链接，指向您的文件夹，您就不必担心再次同步，因为它们共享相同的资源。

然而，您不想这样做是有原因的：

它们看起来不同。 - 这听起来很荒谬，但实际上，这是人们不喜欢符号链接的最常见原因。人们抱怨是因为他们“在他们的程序中看起来很奇怪”或者其他什么。
符号链接在某些功能上受到限制，但因此具有其他巨大的优势。但是，就像跨文件系统指向等。几乎所有缺点都可以在您的应用程序中得到很好的处理和解决。可怜的事实是，符号链接是 Linux 操作系统和文件系统的基本功能，但在开发应用程序时有时会忘记它们的存在。这就像开发一列火车，却忘记了还有长腿的人或其他东西......

另一方面，硬链接确实看起来像文件，因为它们是文件。

每个指向一个文件的硬链接就是那个文件。

这听起来很混乱，但可以这样想：

每个文件都是光盘上的一些数据。然后有一些 inode 指针，在某个目录中，有一些名称指向该资源。

硬链接就是这样。该文件只有多个“列表”。

因此，它们共享相同的读锁，一起修改/删除等。

然而，这当然只能在一个文件系统/设备上完成，而不能跨设备。

链接有一些很大的优点。它们非常明显：

您没有重复的数据。这消除了不一致的可能性，并且您不必更新并且需要更少的磁盘空间。

然而，这有时具有更重要的意义。

例如，如果您运行多个网站并且所有网站都使用 Zend Framework。

这是一个巨大的框架，它的操作码缓存将填满你的 50 兆内存或其他东西。

如果您的网站有相同的 zend 库文件夹，则只需要一次。

Read my extended answer for the edited question below.

The most trivial and naive approach would probably to set up a script that just runs rsync for every server you want to synchronize.

This is fine in most cases, but I don't think this is what you are looking for, because you would have figured that out yourself...

This method also has the following disadvantages:

One server sends all the traffic, there is no cascading. So its a single point of failure and a bottleneck
It is very inefficient. Rsync is a great tool, but parsing the file list and checking differences is not really quick, if you want to synchronize hundreds of servers

But What can you do?

Configuring rsync for multiple servers is obviously the easiest way to go. So you should start with that and optimize where your problems are.

You can speed it up for example by using the right Filesystem. XFS will probably be like 50 times faster than Ext3.

You can also use unison which is bit more powerful tool and keeps a list of Files in cache.

You can also set up a cascade (Server A synchronizing to Server B synchronizing to Server C).

You could also set up pulling rather than pushing by you clients. You could have a sub domain for that which is point of entry to a load balancer, where you have 1 or more servers behind which you synchronize by pushing from your source server.

The reason why I am telling you all this is because there is not the perfect way to go, you have to figure it out depending on your needs.

However I would definitely recommend looking into GIT.

Git is a version control system that is very powerful and efficient.

You could create a git repository and push to your client machines.

It works very well and efficient and is flexible and scalable, so you can build almost anything on this structure including distributed file systems, cascades, loadbalancing etc.

Hope I gave you some points in the right directions you can look into.

Edit:

So looks like you want to synchronize changes on the same server - or even same hard disc (which I don't know, but is very important for the possibilities that you have).

Well basically its all the same. Insert - Overwrite - Delete ...
Rsync is also a very great tool for that because it transfers changes incremental. Not only "resumes broken transfers".

But i would say it completely depends on the content.

If you have a lot of small files, such as you say template, javascript, etc, rsync may be very slow. It might even faster to completely delete the source folder and copy the files there again. So rsync (or any other tool) doesn't have to check all files for changes etc.

You can also just copy everything with the -rf switch so everything will be overwritten, but then you could have old files there that got deleted.

I also know many cases where such stuff is done using subversion, because people feel like having more control or something I dunno. Its also more flexible.

However there is one thing that you should think of:

There is the concept of shared data.

There are symlinks and hard links.

You can put them on files and folders (hard links only on files. I dunno why).

If you put a Symlink A on a target B the file looks like being located and named like the symlink, but the resource behind is somewhere completely different.
But applications CAN distinguish. Apache for example has to be configured to follow symlinks (otherwise it would be a security issue).

So if you changes are all in one folder you could just put a symlink called like that folder, pointing to your folder there, and you never have to worry about synchronizing again, because they share the very same resource.

However there are reasons why you wouldn't want to do so:

They look different. -that sounds absurd, but really, that is the most common reason why people don't like symlinks. People are complaining because they "look so weird in their program" or whatever.
Symlinks are limited in certain capabilities but therefore have other huge advantages. Like cross-filesystem pointing etc. However. Almost every disadvantage can be quite well dealed with and be worked around in your application. The pitiful truth is that symlinks are a fundamental feature of linux oses and filesystems, but their existence is sometimes forgotten when developing an application. Its like developing a train but forgetting that there are also people with long legs or something...

Hardlinks on the other hand do exactly look like files because they are files.

And every hardlink pointing to one file is that very file.

It sounds confusing but think of it as follows:

Every file is some data on the disc. Then there is some inode pointer which inside some directory with some name pointing to that resource.

Hardlinks are just that. There are just multiple "listings" of the file.

As a consequence they share the same read lock, get modified/deleted/etc together.

This however can of course only be done on one filesystem/device and not cross-device wise.

Links have some big advantages. They are quite obvious:

You don't have duplicate data. Which eliminates the potential for inconsistencies and you don't have to update and need less space on the disk.

This however has sometimes far more significance.

For example if you run multiple website and all of them use the Zend Framework.

This is shitload huge framework and the opcode caching of it will fill up like 50 megs of your ram or something.