Git 用于点对点内容分发网络
有人以这种方式使用 git 吗?
我想将一些多媒体内容从服务器分发到一些Android 远程设备。我希望他们发回一个包含设备使用统计信息的日志文件(由我将编写的 Android 应用程序提供)。
服务器可以是任何东西,但我更喜欢 Linux 盒子。
我认为由于 git handle 和 sych 仅在文件之间存在差异,因此这将是一个很好的工具,并且我将拥有内容修订历史记录作为奖励。
我需要一些关于如何组织存储库架构的建议:它必须是星形拓扑还是不同的东西?
系统的远程端不需要任何交互性,换句话说,远程 git 存储库可以自主地、自动地拉动和推送它需要的任何内容。
更新:我发现这里 git 内部结构的作者(我现在正在下载它),Scott Chacon 正在谈论我想要实现的架构。
更新 2: 好的,我阅读了有关“Git 的非 SCM 使用”的章节,以下是作者对点对点 CDN 的评价:
您必须获得新内容[...] 由 xml 的任意组合组成 文件、图像、动画、文本和 声音。您需要构建一个内容 分发框架将 轻松高效地转移所有 机器所需的内容 在您的网络上。你需要 不断确定每个内容的内容 机器有什么以及它需要有什么 并将差值转移为 尽可能高效。[...] 事实证明,Git 是一个 这个问题的完美解决方案。
我没有发现任何关于书中提到这本书的小部分的内容,所以我希望我没有侵犯任何版权。无论如何,如果有人抱怨我会删除它。
Is there anyone using git in such a fashion?
I would like to distribute some multimedia content from a server to some Android remote devices. I would like them sending back a log file with device usage statistics (provided by an android app I will write).
The server could be anything but I would prefer a linux box.
I thought that since git handle and sych only differences between files, It would be a nice tool for this purpose and I would have content revision history as a bonus.
I need some piece of advice on how the repositories architecture could be organized: does It have to be a star topology or something different?
The remote end of the sistem don't need any interactivity, in other words the remote git repository could pull and push whatever It needs to, autonomously and automatically.
UPDATE: I've found here on SO the author of git internals (I'm downloading It right now), Scott Chacon talking about the architecture I would like to implement.
UPDATE 2: OK I read the chapter about "Non-SCM uses of Git" and here is what the author says about a Peer to Peer CDN:
You have to get new content [...]
consist of any combination of xml
files, images, animations, text and
sound. You need to build a content
distribution framework that will
easily and efficiently transfer all
the necessary content to the machines
on your network. You need to
constantly determine what content each
machine has and what it needs to have
and transfer the difference as
efficiently as possible.[...]
It turns out that Git is an
excellent solution to this problem.
I don't find anything about mentioning little portions of the book inside it, so I hope that I'm not violating any copyright. In any case I will delete It if someone complain.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议不要将 git 用于此类目的。对于初学者来说,Git 将使用额外的手机存储来存储修订历史记录,并且无论如何它都会发送整个文件(而不是增量文件),因为多媒体内容是二进制的,并且差异对其不起作用。只需实现一个方法来列出服务器端多媒体以及最后修改日期和另一个方法来下载更新的文件(我建议使用 HTTP,因为它是最简单的)。在服务器端,您当然可以在内部使用 git 来对多媒体文件进行版本控制,但我不想公开 git 接口。
I would suggest against using git for such purpases. For starters, Git will use extra phone storage for the revision history, and it will send entire files (not deltas) anyway because multimedia content is binary and diffing does not work on it. Just implement a method to list server-side multimedia with last-modification dates and another method to download updated files (I would suggest HTTP as it is the simplest). On the server side, you can of course use git internally for versioning the multimedia files, but I'd rather not expose the git interface.
git 协议尝试发送补丁而不是整个文件,但 git 存储引擎始终存储整个文件,并始终保留文件的旧版本。如果您不想保留文件历史记录,那么 git 可能不是适合这项工作的工具。
rsync 是一个成熟的文件分发系统,可以通过 ssh 或其自己的协议(与 git 相同)工作,可以制作二进制补丁,并且不一定保留更改历史记录。可能会开始寻找那里,看看你是否能找到这份工作。
The git protocol tries to send patches instead of whole files, but the git storage engine always stores whole files, and always keeps old versions of the files. git is probably not the tool for the job if you aren't trying to keep file history.
rsync is a mature file distribution system that can work over ssh or its own protocol (the same as git), can make binary patches, and doesn't necessarily keep change history. Probably start looking there to see if you can get that work.
因此,在之前的工作中,我们正是使用 Git 来实现此目的,原因是我们的媒体资产不经常更改,因此无论我们使用什么,我们都可能必须发送整个文件 - 因此,二进制的问题尽管其他内容分发工具也存在问题,但增量化并不重要。
rsync 的主要优点(大概是一致的,尽管我从未使用过它)是您可以在索引中构建内容树并将树存储在每个客户端的分支下的 Git 中,而不必将所有内容都放在磁盘上运行rsync 开启。如果您的内容有多种变体,那么能够记录每个客户端所需的独特内容树(您可以有数千种组合)并使用简单的 pull 命令仅获取所需内容并在客户端上更新它,这是非常酷的。这就是我们选择 Git 而不是 rsync 来做到这一点的原因。如果每个客户端都需要完全相同的数据集,也许 rsync 会更容易,但是 Git 的另一个好处是您可以获得每个客户端上内容的历史记录 - 每个客户端的内容何时以及如何更改。
我们还用它来记录日志文件 - 因为它们通常非常统一并且基于文本,它们的增量非常好并且传输非常高效 - 我们非常高兴使用它来记录和传输回上游我们的日志数据。
So in a previous job, we used Git for exactly this and the reason was that our media assets were not often changing, so no matter what we used it was likely we would have to send the whole file anyways - thus, the issues with binary deltifying, though also an issue with other content distribution tools, was not important.
The main advantage to rsync (and presumably unison, though I've never used it) is that you can build the content trees in the index and store the trees in Git under a branch per client rather than having to have everything on disk to run rsync on. If you have several variations on content, it's pretty cool to be able to record unique trees of content needed by each client - of which you could have thousands of combinations - and have a simple pull command fetch only what's needed and update it on the client. That was the reason we choose Git instead of rsync to do that. If every client needs exactly the same set of data, perhaps rsync would be easier, however the other nice thing about Git is that you get a history of the content on each client - when and how it changed for every single client.
We also used it to record log files - since they are generally pretty uniform and text based, they delta excellently and transfer very efficiently - we were very happy with using that to record and transfer back upstream our log data.