DVC,git-annex,git-lfs对于大型或二进制文件而不是git的优势是什么?

发布于 2025-01-17 16:05:20 字数 891 浏览 0 评论 0原文

如果我有不同版本的文件,例如,在不同的分支中,并且我尝试协调这些版本,git 将为此提供很好的机制。然而,为了进行协调(例如,在合并中),git 需要访问文件的“内部”。因此文件应该是文本文件。

如果我更改版本控制文件,git 不会保存这些文件之间的增量,而是保存文件的保险箱和整个快照。如果对一个大文件进行更改,即使是很小的更改,整个文件将被 git 存储两次。因此文件应该很小。

大文件或二进制文件(或两者兼而有之),Git 不应跟踪它们。如果我的项目中仍然需要它们,我应该使用 DVC、git-annex、git-lfs 之类的东西。

据我了解,这三个文件都将其他文件保留在 git 之外,并保留由 git 跟踪的引用。我将使用 DVC 作为替身,因为我对其他两个知之甚少。

  1. 在 DVC 中,引用是一个文本文件,因此 git 不会感到困惑。不过,由于只是一个参考,所以 git 无论如何也没有太多的合并工作。因此,实际上并不需要 git 的协调功能。那么在这方面使用DVC有什么优势呢?我不能只使用 git 而不使用这些机制吗?

  2. 在 DVC 中,似乎如果我更改一个大文件,就像在 git 中一样,就会创建该文件的快照(而不是保存增量)。那么,与 git 相比,这如何改善这种情况呢?我仍然得到这个大文件的很多(接近)副本。

我从此处了解到 git-lfs 将我的文件的大部分(近)副本保留在远程存储中。仅当我签出大文件的相应版本时,才会下载文件。在这种情况下,虽然我对第二点的看法是正确的,但至少这只是服务器的“问题”(就空间而言),而不是我的本地磁盘空间,也不是互联网带宽的使用问题。这对于 DVC 来说可能是一样的。

我对第 1 点和第 2 点的“反对意见”或“警告”有效吗?

If I have different versions of a file, e.g., in different branches, and I try to reconcile those, git will has great mechanisms for that. However, in order to do the reconciliations, e.g., in a merge, git requires access to the "inside" of the file. Thus files should be text files.

If I change a version controlled file, git does not save the delta between those files, but safes and entire snapshot of the file. If one makes a change, even a small change, to a large file, the entire files will be stored twice by git. Thus files should be small.

Files that are either large or binary (or both), they should not be tracked by Git. If I still need them in my project, I should use something like DVC, git-annex, git-lfs.

As far as I understand, all three of those keep the those other files outside of git, and keep a reference, which is tracked by git. I will use DVC as a stand-in, as I know even less about the other two.

  1. In DVC, the reference is a text file and thus, git will not get confused. However, since it is only a reference, there is not much merging to be done by git anyways. So, git's reconciliation-capabilities are not really required. What is the advantage of using DVC then regarding this aspect? Can't I just use git and just not use those mechanisms?

  2. In DVC, it seems that if I change a large file, just like in git, a snapshot of that file is created (not a delta saved). So, how does this improve the situation compared to git? I still get lots of (near) copies of this big file.

I understand from here that git-lfs keeps most of the (near) copies of my file in the remote storage. Only if I checkout the respective version of the large file, the files is downloaded. In that case, while I would be correct about my point 2, at least it is only a "problem" of the server (in terms of space), but not on my local disk space and also not for the internet bandwidth usage. This might be the same for DVC.

Are my "objections" or "caveats" of the points 1 and 2 valid?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

世态炎凉 2025-01-24 16:05:20

这不仅仅是优势。

  • Git并不是首先要处理二进制文件,因为它们的内容不一定是增量(如文本/代码),因此也没有“ delta保存”。
  • 尽管Git可以从技术上处理任意的大文件,但索引它们的索引非常慢。
  • GIT托管服务(例如GitHub)确实具有文件大小限制(即使使用LFS)。

尤其是DVC很好,因为您不需要特殊的服务器即可使用它,只需配置您已经拥有的任何存储提供商(例如某些SSH盒子或S3存储桶)。

RE2。DVC还确保根据其内容在存储中不复制文件(非常适合目录结构中的多个小文件,更多信息)。

It's more of a need than just an advantage.

  • Git is not meant to handle binary files in the first place, as their contents are not necessarily incremental (as with text/code) so no "delta saving" either.
  • While Git can technically handle arbitrarily large files, it will be very slow in indexing them.
  • Git hosting services like Github do have file size limits (even with LFS).

DVC in particular is nice because you don't need special servers to use it, just configure any storage provider you already own (e.g. some SSH box or an S3 bucket).

Re 2. DVC also makes sure no files are duplicated in your storage based on their content (great for datasets organized as multiple small files in a directory structure, more info).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文