使用 Amazon S3 作为后端对文件系统进行版本控制
我正在尝试在我的 Debian 计算机和一台 OS X 笔记本电脑上完成以下工作。
我想要的是某种使用 Amazon S3 作为后端的版本控制文件系统。
我的想法是使用 s3fs (使用 FUSE)挂载存储桶,然后创建一个使用 GIT 的文件系统,每次写入文件时都会进行新的提交(我想要长达 x 天的完整版本历史记录)。然后,安装的文件夹应该显示文件的最新版本。 我不知道如何解决的问题之一(我认为是由于缺乏经验)是我想将文件与本地文件夹同步。当然,我可以下载所有文件,但这对带宽不友好。
另一个问题是当前版本的 s3fs 似乎无法与 MacFUSE 配合使用。
此外,这种情况可能不会发生,但我想防止两台计算机同时写入文件时文件损坏。如果我理解正确的话,git 本身实现了某种文件锁定,并且不依赖于操作系统的文件锁定。
使这项工作可行的大纲是什么?我想以这种方式存储的文件只是 .tex 文件和矢量图像。
我知道存在一些解决方案(例如 dropbox),但我真的不喜欢它是闭源的。
I'm trying to make the following work on my Debian computers and one OS X Laptop.
What I would like to have is some kind of versioning file system that uses Amazon S3 as a backend.
What I was thinking is to use s3fs (using FUSE) to mount the bucket, then make a filesystem that uses GIT that makes a new commit everytime I write the file (I would like a complete version history up to x days). The mounted folder should then show the latest version of the files.
One of the problems which I don't know how to solve (due to a lack of experience, I assume) is that I would like to synchronise the files with a local folder. Of course, I could just download all the files but that is not bandwidth friendly.
Another problem is that the current version of s3fs does not seem to work with MacFUSE.
Further, something that will probably not happen but I would like to prevent the files from becoming corrupt if two computers write to the file at the same time. If I have understood correctly, git implements some kind of file locking itself and does not depend on the file locking of the operating system.
What could be an outline to make this work? The files which I would like to store these way are just .tex-files and vector images.
I know that there are solutions in existence (like dropbox) but I don't really like that it is closed source.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,我要说的是,我不建议盲目地在 s3 上运行 git。 git在运行过程中会产生很多小文件;当处理大量非常小的对象时,S3 很昂贵(而且很慢)。正如您所猜测的,S3 也没有机制锁定;最终一致性使这成为不可能。最后,git 依赖于对其对象数据库的快速随机访问; S3 无法提供此功能,因此无论如何您都需要整个存储库的本地镜像。
相反,我建议您扩展现有的 git http 后端以推送到 S3。这不会推送松散的文件,而是推送单个包文件。这将利用 S3 的优势——大量加载大型对象。您仍然没有锁定,但是由于您决定何时手动推送,因此您可以找到其他一些方法来轻松协调事情。
First, let me say that I would not recommend blindly running git on s3. git produces a lot of small files during its operation; S3 is expensive (and slow) when dealing with a large number of very small objects. As you surmise, S3 also has no mechanism locking; eventual consistency makes this impossible. And finally, git depends on fast random access to its objects database; S3 cannot provide this, so you'll need a local mirror of the entire repository in any case.
Instead, I would recommend that you extend the existing git http backend to push to S3. Instead of pushing loose files, this would push a single pack file. This would leverage what S3 is good at - a bulk load of large objects. You'd still have no locking, but since you decide when to push manually, you can find some other way to coordinate things easily enough.