保持多个 Linux 服务器同步的最佳方法是什么?

发布于 2024-07-05 21:37:55 字数 437 浏览 6 评论 0原文

我在相当广阔的区域内有几个不同的地点,每个地点都有一个 Linux 服务器来存储公司数据。 这些数据每天在不同的地点以不同的方式发生变化。 我需要一种方法来使这些数据保持最新并在所有这些位置之间同步。

例如:

在一个位置,有人将一组图像放置在其本地服务器上。 在另一个位置,其他人将一组文档放置在其本地服务器上。 第三个位置向其服务器添加了一些图像和文档。 在另外两个位置,其本地服务器根本没有进行任何更改。 到第二天早上,我需要所有五个地点的服务器来保存所有这些图像和文档。

我的第一反应是使用 rsync 和 cron 作业在夜间(凌晨 1 点到 6 点左右)进行同步,此时我们所在位置的带宽都没有被使用。 在我看来,最好让一台服务器作为“中央”服务器,首先从其他服务器拉取所有文件。 然后它将这些更改推送回每个远程服务器? 或者还有其他更好的方法来执行此功能吗?

I have several different locations in a fairly wide area, each with a Linux server storing company data. This data changes every day in different ways at each different location. I need a way to keep this data up-to-date and synced between all these locations.

For example:

In one location someone places a set of images on their local server. In another location, someone else places a group of documents on their local server. A third location adds a handful of both images and documents to their server. In two other locations, no changes are made to their local servers at all. By the next morning, I need the servers at all five locations to have all those images and documents.

My first instinct is to use rsync and a cron job to do the syncing over night (1 a.m. to 6 a.m. or so), when none of the bandwidth at our locations is being used. It seems to me that it would work best to have one server be the "central" server, pulling in all the files from the other servers first. Then it would push those changes back out to each remote server? Or is there another, better way to perform this function?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

一张白纸 2024-07-12 21:37:55

您可以(理论上)做的一件事是使用 Python 或其他东西以及 inotify 内核功能(例如,通过 pyinotify 包)创建一个脚本。

您可以运行该脚本,该脚本注册以接收某些树上的事件。 然后,您的脚本可以监视目录,然后在每台服务器上的情况发生变化时更新所有其他服务器。

例如,如果有人将 spreadsheet.doc 上传到服务器,脚本会立即看到它; 如果文档在 5 分钟内没有被修改或删除,脚本可以将其复制到其他服务器(例如通过 rsync)。

理论上,这样的系统可以实现一种从一台机器到另一台机器的有限“文件系统复制”。其他。 这是一个不错的主意,但您可能必须自己编写代码。

One thing you could (theoretically) do is create a script using Python or something and the inotify kernel feature (through the pyinotify package, for example).

You can run the script, which registers to receive events on certain trees. Your script could then watch directories, and then update all the other servers as things change on each one.

For example, if someone uploads spreadsheet.doc to the server, the script sees it instantly; if the document doesn't get modified or deleted within, say, 5 minutes, the script could copy it to the other servers (e.g. through rsync)

A system like this could theoretically implement a sort of limited 'filesystem replication' from one machine to another. Kind of a neat idea, but you'd probably have to code it yourself.

|煩躁 2024-07-12 21:37:55

如果 rsync 不是最佳解决方案,则另一种选择是 Unison。 Unison 在 Windows 下工作,并且它具有一些用于处理双方发生更改时的功能(不一定需要像您所建议的那样选择一台服务器作为主服务器)。

根据任务的复杂程度,两者都可以。

An alternative if rsync isn't the best solution for you is Unison. Unison works under Windows and it has some features for handling when there are changes on both sides (not necessarily needing to pick one server as the primary, as you've suggested).

Depending on how complex the task is, either may work.

爱的那么颓废 2024-07-12 21:37:55

我的做法(在 Debian/Ubuntu 机器上):

  • 使用 dpkg --get-selections 获取已安装的软件包
  • 使用 dpkg --set-selections 来安装这些软件包从创建的列表中
  • 使用源代码控制解决方案来管理配置文件。 我以集中方式使用 git,但使用 subversion 也同样容易。

The way I do it (on Debian/Ubuntu boxes):

  • Use dpkg --get-selections to get your installed packages
  • Use dpkg --set-selections to install those packages from the list created
  • Use a source control solution to manage the configuration files. I use git in a centralized fashion, but subversion could be used just as easily.
夕色琉璃 2024-07-12 21:37:55

AFAIK,rsync 是您的最佳选择,它支持部分文件更新以及各种其他功能。 一旦设置,它就非常可靠。 您甚至可以使用带时间戳的日志文件设置 cron,以跟踪每次运行中更新的内容。

AFAIK, rsync is your best choice, it supports partial file updates among a variety of other features. Once setup it is very reliable. You can even setup the cron with timestamped log files to track what is updated in each run.

蒲公英的约定 2024-07-12 21:37:55

我不知道这有多实用,但源代码控制系统可能在这里起作用。 在白天的某个时刻(也许每小时?),一个 cron 作业运行一次提交,而在夜间,每台机器运行一次结帐。 当需要运行签出时,您可能会遇到长时间提交未完成的问题,并且本质上可以通过 rsync 完成相同的事情。

我想我的想法是,中央服务器将使您的同步操作变得更容易 - 冲突可以在中央处理一次,然后推送到其他机器。

I don't know how practical this is, but a source control system might work here. At some point (perhaps each hour?) during the day, a cron job runs a commit, and overnight, each machine runs a checkout. You could run into issues with a long commit not being done when a checkout needs to run, and essentially the same thing could be done rsync.

I guess what I'm thinking is that a central server would make your sync operation easier - conflicts can be handled once on central, then pushed out to the other machines.

夏日落 2024-07-12 21:37:55

rsync 将是您的最佳选择。 但是您需要仔细考虑如何解决不同站点上相同数据的更新之间的冲突。 如果 site-1 已更新
“customers.doc”和 site-2 对同一文件有不同的更新,您将如何解决它?

rsync would be your best choice. But you need to carefully consider how you are going to resolve conflicts between updates to the same data on different sites. If site-1 has updated
'customers.doc' and site-2 has a different update to the same file, how are you going to resolve it?

时光无声 2024-07-12 21:37:55

我必须同意 Matt McMinn 的观点,特别是因为它是公司数据,我会使用源代码控制,并根据更改率更频繁地运行它。

我认为中央票据交换所是个好主意。

I have to agree with Matt McMinn, especially since it's company data, I'd use source control, and depending on the rate of change, run it more often.

I think the central clearinghouse is a good idea.

农村范ル 2024-07-12 21:37:55

取决于以下
* 需要同步多少台服务器/计算机?
** 如果使用rsync的服务器太多就会出现问题
** 您可以使用线程并同时或一个接一个地同步到多个服务器。
因此,在后一种情况下,您会在给定时间点看到源计算机上的高负载或服务器(集群中)上不一致的数据

  • 需要同步的文件夹的大小及其更改频率

    • 如果数据量很大,那么 rsync 会花费一些时间。
  • 文件数量

    • 如果文件数量很大,特别是小文件,rsync 将再次花费大量时间

所以一切都取决于场景是否使用 rsync , NFS ,版本控制

  • 如果服务器较少并且只是如果数据量较小,则每小时运行一次 rysnc 是有意义的。
    如果数据偶尔发生变化,您还可以将内容打包到 RPM 中。

根据所提供的信息,IMO 版本控制将最适合您。

如果两个人上传同名的不同文件,Rsync/scp 可能会出现问题。
多个位置上的 NFS 需要进行完美的架构设计

为什么不拥有一个/多个存储库,并且每个存储库都只提交到这些存储库。
您所需要做的就是保持存储库同步。
如果数据庞大且更新频繁,那么您的存储库服务器将需要大量 RAM 和良好的 I/O 子系统

Depends upon following
* How many servers/computers that need to be synced ?
** If there are too many servers using rsync becomes a problem
** Either you use threads and sync to multiple servers at same time or one after the other.
So you are looking at high load on source machine or in-consistent data on servers( in a cluster ) at given point of time in the latter case

  • Size of the folders that needs to be synced and how often it changes

    • If the data is huge then rsync will take time.
  • Number of files

    • If number of files are large and specially if they are small files rsync will again take a lot of time

So all depends on the scenario whether to use rsync , NFS , Version control

  • If there are less servers and just small amount of data , then it makes sense to run rysnc every hour.
    You can also package content into RPM if data changes occasionally

With the information provided , IMO Version Control will suit you the best .

Rsync/scp might give problems if two people upload different files with same name .
NFS over multiple locations needs to be architect-ed with perfection

Why not have a single/multiple repositories and every one just commits to those repository .
All you need to do is keep the repository in sync.
If the data is huge and updates are frequent then your repository server will need good amount of RAM and good I/O subsystem

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文