将生成的文件存储在 Git 中

发布于 2024-11-01 01:43:24 字数 1201 浏览 0 评论 0原文

我们有一个相当大且过于混乱的代码库,我们希望使用 Git 进行迁移。目前,它是一个大的整体块,不能轻易地分割成更小的独立组件。该代码构建了大量共享库,但它们的源代码如此交错,目前无法将其干净地分离到单独的存储库中。

我不太关心 Git 是否可以处理将所有代码放在一个存储库中,但问题是我们需要对源代码和从中构建的许多库进行版本控制。从头开始构建所有内容需要几个小时,因此在检查代码时,开发人员还应该获取这些库的预编译版本以节省时间。

这是我可以使用一些建议的地方。这些库不需要 100% 是最新的(因为它们通常保持二进制兼容性,并且在必要时始终可以由个人开发人员重建),因此我正在寻找避免弄乱我们的源代码存储库的方法具有无数略有不同版本的二进制文件,无论如何都可以从源重新生成这些文件,同时仍然使开发人员可以轻松访问这些库,因此他们不必从头开始重建所有内容。

所以我想要某种方法来实现类似以下的目标。

  • 这些库由我们的构建服务器定期生成,然后可以将它们提交到 Git 存储库。然后,开发人员应该将这些文件视为只读(拉取最新版本,并在必要时就地重建,但不要提交新版本),理想情况下,Git 应该强制执行此操作。 (特别是,运行快速 git commit -a 的开发人员不应该最终意外地用所有这些生成的文件的新版本污染存储库)
  • 将这些文件保存在单独的存储库中,以便源代码不必永远携带所有这些生成的二进制文件(因为它们可以方便地减少编译时间,但实际上并不是必需的)。

当然,同时,使用这些的过程也应该尽可能的顺利。在检查源代码时,从中构建的库应该遵循(或者至少很容易获得)。并且在提交时,不应该仅仅因为它们被重新编译并且现在嵌入了不同的时间戳而意外地提交这些库的新版本。

我一直在考虑使用 git 的子模块的选项,创建包含源代码的“超级”存储库,然后为生成的库创建一个或多个子模块,但到目前为止,似乎有点太笨拙和脆弱不适合我的口味。看起来它们实际上并没有阻止开发人员直接向子模块提交更改,它只会导致事情进一步恶化(在使用子模块时,我最终得到了更多分离的 HEAD 比我想数的多)。

考虑到我们几乎所有的开发人员都是 Git 新手,这最终可能会浪费更多的时间,而不是为我们节省的时间。

那么我们有什么选择呢?对于 Git 专家来说,子模块方法听起来合理吗?我如何“驯服”它,以便它对我们的开发人员来说尽可能易于使用(并且很难搞砸)?

或者是否有我们没有考虑过的完全不同的解决方案?

我应该提到的是,我只使用了 Git 几天,所以我自己也算是个新手。

We have a reasonably large, and far too messy code base that we wish to migrate to using Git. At the moment, it's a big monolithic chunk that can't easily be split into smaller independent components. The code builds a large number of shared libraries, but their source code is so interleaved that it can't be cleanly separated into separate repositories at the moment.

I'm not too concerned with whether Git can cope with having all the code in a single repository, but a problem is that we need to version both the source code and many of the libraries built from it. Building everything from scratch takes hours, so when checking out the code, developers should also get precompiled versions of these libraries to save time.

And this is where I could use some advice. The libraries don't need to be 100% up to date (as they generally maintain binary compatibility, and can always be rebuilt by the individual developer if necessary), so I'm looking for ways to to avoid cluttering up our source code repository with countless marginally different versions of binary files which can be regenerated from the source anyway, while still making the libraries easily accessible to developers so they don't have to rebuild everything from scratch.

So I'd like some way to achieve something like the following.

  • the libraries are generated by our build server on a regular basis, which could then commit them to the Git repository. The developers should then treat these files as read-only (pull the latest version, and when necessary, rebuild in place, but don't commit new versions), and ideally, Git should enforce this. (In particular, a developer running a quick git commit -a shouldn't end up accidentally polluting the repository with a new revision of all these generated files)
  • keep these files in a separate repository, so the source code won't have to carry around all these generated binary files perpetually (since they're a convenience to cut down on compilation time, but they're not actually necessary).

Of course, at the same time, the process of using these should be as smooth as possible. When checking out the source, the libraries built from it should follow (or at least, be easy to get). And when committing, it shouldn't be possible to accidentally commit new versions of these libraries, just because they were recompiled and now have a different timestamp embedded.

I've been looking at the option of using git's submodules, creating the "super" repository containing the source code, and then one or more submodules for the generated libraries, but so far, it seems a bit too clumsy and fragile for my taste. It seems that they don't actually prevent the developer from committing changes directly to the submodule, it just causes things to break further down the line (while playing around with submodules, I've ended up with more detached HEADs than I care to count).

Considering virtually all our developers are new to Git, that may end up wasting more time than it saves us.

So what are our options? Does the submodule approach sound sensible to you Git gurus out there? And how do I "tame" it, so it's as easy to use (and hard to mess up) as possible for our developers?

Or is there an entirely different solution we haven't considered?

I should mention that I've only used Git for a couple of days, so I'm pretty much a newbie myself.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

独闯女儿国 2024-11-08 01:43:24

我会将它们保存在与源文件不同的存储库中。您可以使用“git submodules”来保留两者之间的引用;因此“编译的库”成为父模块,源代码成为子模块。这样,当您提交库时,您就提交了对当时源代码的确切位置的引用。

此外,由于开发人员不需要完整的历史记录,因此您可以使用 git clone --depth 1 libs.git 它只为您提供最新版本的库。它不会提取进一步的历史记录,并且不允许您提交(这是可以的,因为服务器应该为您执行此操作),并且您将允许他们访问最新版本(或您在克隆上指定的任何分支)命令与 -b)。

理想情况下,您不希望主 git 存储库包含或指向二进制存储库。

I would keep these in a separate repository to the source files. You can use 'git submodules' to keep a reference between the two; so the 'compiled libs' becomes the parent and the source becomes the submodule. That way, when you commit the libs you commit a reference to the exact point of the source code at the time.

Further, since developers don't need the full history, you can use git clone --depth 1 libs.git which gives you only the latest version of the libs. It doesn't pull further history, and doesn't allow you to commit (which is OK since the server should be doing that for you) and you'll give them access to the latest versions (or whatever branch you specify on the clone command with -b).

Ideally you don't want the main git repository containing, or pointing to, the binary repository.

我很OK 2024-11-08 01:43:24

理想的解决方案是避免对二进制文件进行版本控制,并将其存储在工件存储库 就像Nexus

VCS 中交付的问题在于,VCS 旨在记录并保留其管理的所有文件的历史记录,而:

  • 交付的许多版本是中间版本,需要在某一点或另一点进行清理
  • (删除旧版本)在 VCS 中很难做到,但在工件存储库中却很容易做到。
  • 存储库的大小将成为一个问题(特别是对于 DVCS,除非您始终获得最新版本,在这种情况下,浅层克隆可能会缓解该问题),
  • 无法比较版本一个二进制文件与另一个二进制文件(因此“版本控制”没有多大意义)

The ideal solution is to avoid versioning binaries and store them in an artifact repository like Nexus.

The issue with deliveries in a VCS is that a VCS is design to record and keep the history of all files it manages, whereas:

  • many versions of a delivery are intermediate builds that will need to be cleaned up at one point or another
  • cleaning (removing old versions) is quite hard to do in a VCS, very easy to do in an artifact repository.
  • the size of a repo will become an issue (especially for a DVCS, unless you always get the latest version, in which case a shallow clone might alleviate that issue)
  • there is no way of comparing a version of a binary with another (so "versioning" don't make a lot of sense)
故事未完 2024-11-08 01:43:24

我不是 git 专家,但我想这可以通过子模块来解决。将预编译的二进制文件添加为子模块,要获取它们,只需执行以下操作:

git submodule update --init

描述了如何忽略子模块中的更改此处。因此,如果开发人员重建了某些内容,则不会使用 git commit -a 提交它,也不会使用 git add . 添加它。他们只需确保不会直接从子模块内提交某些内容。这个 Vimcast 展示了如何使用子模块来保持你的vim 文件受到控制,但这应该很容易适应您的问题。

I am not a git-guru, but I guess this could be solved with submodules. Add the precompiled binaries as submodules, to get them then one simply has to do this:

git submodule update --init

How to ignore changes in submodules is described here. So, if the dev rebuilts something it will NOT be committed with git commit -a and not added with git add .. They just have to make sure they do not commit something from within the submodule directly. This Vimcast shows how to use submodules for keeping your vimfiles under controll, but this should be easy to adapt for your problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文