从多台机器收集文件?
我有很多机器(20 多台)连接在网络中。每台机器访问一个中央数据库,对其进行查询,处理查询的信息,然后将结果写入其本地硬盘上的文件。
处理完成后,我希望能够将所有这些文件(从所有远程计算机)“抓取”回主机进行存储。
我想到了三种可能的方法:
(1)从主机 rsync 到每个远程计算机,并“询问”文件
(2)从每个远程计算机 rsync 到主机,然后“发送”文件
( 3)在每台远程计算机上创建一个NFS共享,主机可以访问和读取文件(在这种情况下不需要“rsync”)
其中一种方法比其他方法更好吗?还有我不知道的更好的方法吗?
所有机器都使用 Ubuntu 10.04LTS。预先感谢您的任何建议。
I have many machines (20+) connected in a network. each machine accesses a central database, queries it, processes the information queried, and then writes the results to files on its local hard drive.
Following the processing, I'd like to be able to 'grab' all these files (from all the remote machines) back to the main machine for storage.
I thought of three possible ways to do so:
(1) rsync to each remote machine from the main machine, and 'ask' for the files
(2) rsync from every remote machine to the main machine, and 'send' the files
(3) create a NFS share on each remote machine, to which the main machine can access and read the files (no 'rsync' is needed in such a case)
Is one of the ways better than others? are there better ways I am not aware of?
All machines use Ubuntu 10.04LTS. Thanks in advance for any suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以在主计算机上创建一个 NFS 共享,并让每台远程计算机挂载该共享。看来工作少了。
You could create one NFS share on the master machine and have each remote machine mount that. Seems like less work.
从性能角度来看,它几乎是相同的。您仍在通过(相对)较慢的网络连接发送文件。
现在,我想说您采取哪种方法取决于您想要处理错误或违规行为的位置。如果您希望责任落在处理计算机上,请使用 rsync 返回主计算机;或者反过来,如果您希望主要人员负责组装数据并确保一切正常。
至于共享空间方法,我会在主机上创建一个共享,并让其他人写入它。他们可以在处理完成后立即开始,确保文件正确传输,然后验证校验和或其他内容。
Performance-wise, it's practically the same. You are still sending files over a (relatively) slow network connection.
Now, I'd say which approach you take depends on where you want to handle errors or irregularities. If you want the responsibility to lie on your processing computers, use rsync back to the main one; or the other way round if you want the main one to work on assembling the data and assuring everything is in order.
As for the shared space approach, I would create a share on the main machine, and have the others write to it. They can start as soon as the processing finishes, ensure the file is transferred correctly, and then verify checksums or whatever.
我更喜欢选项(2),因为您知道客户端计算机上的处理何时完成。您可以在所有客户端计算机上使用相同的 SSH 密钥,或者在主机上的authorized_keys 文件中收集不同的密钥。如果主机由于某种原因不可用,它也更可靠,您仍然可以稍后在 NFS 设置中客户端被阻止时同步结果。
I would prefer option (2) since you know when the processing is finished on the client machine. You could use the same SSH key on all client machines or collect the different keys in the authorized_keys file on the main machine. It's also more reliable if the main machine is unavailable for some reason, you can still sync the results later while in the NFS setup the clients are blocked.