当前位置：文江博客话题详情

大型项目的分布式版本控制 - 可行吗？

发布于 2024-08-25 13:45:24 字数 564 浏览 5 评论 0原文

我们现在对 SVN 非常满意，但是 Joel 的教程引起了我的兴趣。所以我想知道 - 这在我们的情况下也可行吗？

问题是 - 我们的 SVN 存储库非常庞大。该软件本身已有 15 年历史，并且已经在多个不同的源代码控制系统中幸存下来。有超过 68,000 个修订（变更集），源代码本身占用超过 100MB，我什至无法开始猜测整个存储库消耗了多少 GB。

那么问题就很简单了——整个存储库的克隆可能需要很长时间才能完成，并且会消耗远程理智的驱动器上更多的空间。由于分布式版本控制的关键是根据需要拥有尽可能多的存储库，因此我开始产生怀疑。

Mercurial（或任何其他分布式版本控制）如何处理这个问题？或者它们无法用于如此巨大的项目？

添加：澄清一下 - 整个项目是一个庞大的项目，它编译为单个 .EXE 并且无法拆分。

添加2：第二个想法 - Linux 内核存储库使用 git，可能比我的大一个或两个数量级。那么他们如何让它发挥作用呢？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拥抱没勇气 2024-09-01 13:45:24

大型项目的分布式版本控制 - 可行吗？

绝对地！如您所知，Linux 规模庞大并使用 Git。 Mercurial 也用于一些主要项目，例如 Python、Mozilla、OpenSolaris 和 Java。

我们现在对 SVN 非常满意，但 Joel 的教程引起了我的兴趣。所以我想知道 - 这在我们的情况下也可行吗？

是的。如果您现在对 Subversion 感到满意，那么您可能没有进行太多分支和合并！

事实是 - 我们的 SVN 存储库非常庞大。 [...] 有超过 68,000 个修订（变更集），源代码本身占用超过 100MB

正如其他人指出的那样，与许多现有项目相比，这实际上并没有那么大。

问题很简单 - 整个存储库的克隆可能需要很长时间才能完成，并且会消耗远程理智的驱动器上更多的空间。

Git 和 Mercurial 在管理存储方面都非常高效，而且它们的存储库占用的空间比同等的 Subversion 存储库（已经转换了一些）少得多。一旦您进行了初步结帐，您只需推动增量，这非常快。它们在大多数操作中都明显更快。初始克隆是一次性成本，因此需要多长时间并不重要（我打赌您会感到惊讶！）。

由于分布式版本控制的关键是根据需要拥有尽可能多的存储库，因此我开始产生怀疑。

磁盘空间很便宜。开发人员的生产力更为重要。那么如果repo占用1GB怎么办？如果你能更聪明地工作，那就值得了。

Mercurial（或任何其他分布式版本控制）如何处理这个问题？或者它们无法用于如此巨大的项目？

可能值得阅读一下使用 Mercurial 的项目（例如 Mozilla）如何管理转换过程。其中大多数都有多个存储库，每个存储库都包含主要组件。 Mercurial 和 Git 也都支持嵌套存储库。还有一些工具可以管理转换过程 - Mercurial 内置支持从大多数其他系统导入。

补充：为了澄清 - 整个事情是一个项目的整体野兽，它编译为单个 .EXE 并且不能拆分。

这使得它变得更容易，因为您只需要一个存储库。

添加 2：第二个想法 - Linux 内核存储库使用 git，可能比我的大一个或两个数量级。那么他们是如何让它发挥作用的呢？

Git 是为原始速度而设计的。磁盘格式、线路协议、内存算法都经过了大量优化。他们开发了复杂的工作流程，补丁从单个开发人员流向子系统维护人员，直至中尉，最终到达 Linus。 DVCS 的最佳优点之一是它们非常灵活，可以支持各种工作流程。

我建议您阅读 Bryan O'Sullivan 撰写的关于 Mercurial 的优秀书籍，这本书将帮助您了解速度快。下载 Mercurial 并完成示例，并在一些临时存储库中使用它以感受它。

然后启动 convert 命令来导入现有的源存储库。然后尝试进行一些本地更改、提交、分支、查看日志、使用内置 Web 服务器等。然后将其克隆到另一个盒子并进行一些更改。对最常见的操作进行计时，并查看其比较情况。您可以免费进行完整的评估，但需要花费一些时间。

Distributed version control for HUGE projects - is it feasible?

Absolutely! As you know, Linux is massive and uses Git. Mercurial is used for some major projects too, such as Python, Mozilla, OpenSolaris and Java.

We're pretty happy with SVN right now, but Joel's tutorial intrigued me. So I was wondering - would it be feasible in our situation too?

Yes. And if you're happy with Subversion now, you're probably not doing much branching and merging!

The thing is - our SVN repository is HUGE. [...] There are over 68,000 revisions (changesets), the source itself takes up over 100MB

As others have pointed out, that's actually not so big compared to many existing projects.

The problem then is simple - a clone of the whole repository would probably take ages to make, and would consume far more space on the drive that is remotely sane.

Both Git and Mercurial are very efficient at managing the storage, and their repositories take up far less space than the equivalent Subversion repo (having converted a few). And once you have an initial checkout, you're only pushing deltas around, which is very fast. They are both significantly faster in most operations. The initial clone is a one-time cost, so it doesn't really matter how long it takes (and I bet you'd be surprised!).

And since the very point of distributed version control is to have a as many repositories as needed, I'm starting to get doubts.

Disk space is cheap. Developer productivity matters far more. So what if the repo takes up 1GB? If you can work smarter, it's worth it.

How does Mercurial (or any other distributed version control) deal with this? Or are they unusable for such huge projects?

It is probably worth reading up on how projects using Mercurial such as Mozilla managed the conversion process. Most of these have multiple repos, which each contain major components. Mercurial and Git both have support for nested repositories too. And there are tools to manage the conversion process - Mercurial has built-in support for importing from most other systems.

Added: To clarify - the whole thing is one monolithic beast of a project which compiles to a single .EXE and cannot be split up.

That makes it easier, as you only need the one repository.

Added 2: Second thought - The Linux kernel repository uses git and is probably an order of magnitude or two bigger than mine. So how do they make it work?

Git is designed for raw speed. The on-disk format, the wire protocol, the in-memory algorithms are all heavily optimized. And they have developed sophisticated workflows, where patches flow from individual developers, up to subsystem maintainers, up to lieutenants, and eventually up to Linus. One of the best things about DVCS is that they are so flexible they enable all sorts of workflows.

I suggest you read the excellent book on Mercurial by Bryan O'Sullivan, which will get you up to speed fast. Download Mercurial and work through the examples, and play with it in some scratch repos to get a feel for it.

Then fire up the convert command to import your existing source repository. Then try making some local changes, commits, branches, view logs, use the built-in web server, and so on. Then clone it to another box and push around some changes. Time the most common operations, and see how it compares. You can do a complete evaluation at no cost but some of your time.

回复收藏 0 原文

宣告ˉ结束 2024-09-01 13:45:24

100MB的源代码比Linux内核还要少。 Linux 内核 2.6.33 和 2.6.34-rc1 之间的更新日志有 6604 次提交。你的存储库规模听起来并不吓人。

从 .tar.bz2 存档中解压缩的 Linux 内核 2.6.34-rc1：445MB
从主 Linus 树中检出的 Linux 内核 2.6 头：827MB

两倍，但与我们拥有的大硬盘相比仍然微不足道。

回复收藏 0 原文

怀中猫帐中妖 2024-09-01 13:45:24

不用担心存储库空间要求。我的轶事：当我将代码库从 SVN 转换为 git（完整的历史 - 我认为）时，我发现克隆使用的空间比 WVN 工作目录少。 SVN 保留所有签出文件的原始副本：在任何 SVN 签出中查看 $PWD/.svn/text-base/ 。使用 git，整个历史记录占用的空间更少。

真正令我惊讶的是 git 的网络效率如何。我在一个连接良好的地方对一个项目进行了 git 克隆，然后将其放在闪存盘上带回家，并使用 git fetch / git pull 保持最新状态>，只有我微不足道的 GPRS 连接。我不敢在 SVN 控制的项目中做同样的事情。

你确实应该至少尝试一下。我想您会惊讶于您以集中式 VCS 为中心的假设是多么错误。

回复收藏 0 原文

听，心雨的声音 2024-09-01 13:45:24

您需要所有历史记录吗？如果您只需要最近一两年的数据，您可以考虑将当前存储库保留为只读状态以供历史参考。然后通过执行 svnadmin dump 具有下限修订版，它构成了新的分布式存储库的基础。

我确实同意另一个答案，即 100MB 工作副本和 68K 修订版并没有那么大。试一试。

回复收藏 0 原文

坐在坟头思考人生 2024-09-01 13:45:24

你说你对 SVN 很满意...那么为什么要改变呢？

就分布式版本控制系统而言，Linux 使用 git，Sun 使用 Mercurial。两者都是令人印象深刻的大型源代码存储库，并且它们运行得很好。是的，您最终会在所有工作站上进行所有修订，但这就是您为去中心化付出的代价。请记住，存储很便宜 - 我的开发笔记本电脑目前拥有 1TB (2x500GB) 的硬盘存储。您是否测试过将 SVN 存储库拉入 Git 或 Mercurial 之类的东西，以实际看看它会占用多少空间？

我的问题是——作为一个组织，你准备好去中心化了吗？对于软件商店来说，保留中央存储库通常更有意义（定期备份、连接到 CruiseControl 或 FishEye、更易于控制和管理）。

如果您只是想要比 SVN 更快或更可扩展的东西，那么只需购买商业产品 - 我已经使用了 Perforce 和 Rational ClearCase，它们可以毫无问题地扩展到大型项目。

回复收藏 0 原文

絕版丫頭 2024-09-01 13:45:24

您可以将一个巨大的存储库拆分为许多较小的存储库，每个存储库对应旧存储库中的每个模块。这样人们就可以将他们以前拥有的任何 SVN 项目简单地保存为存储库。所需空间并不比以前多多少。

回复收藏 0 原文

蓝眼泪 2024-09-01 13:45:24

我在一个相当大的 c#/.net 项目（1 个解决方案中的 68 个项目）上使用 git，并且新鲜获取完整树的 TFS 占用空间约为 500Mb。 git 存储库在本地存储了大量提交，重量约为 800Mb。 git 内部的压缩和存储工作方式非常出色。看到如此小的空间内包含如此多的变化，真是令人惊讶。

回复收藏 0 原文

本王不退位尔等都是臣 2024-09-01 13:45:24

根据我的经验，Mercurial 非常擅长处理大量文件和庞大的历史记录。缺点是您不应签入大于 10 Mb 的文件。我们使用 Mercurial 来保存我们编译的 DLL 的历史记录。不建议将二进制文件放入源代码控制中，但我们还是尝试了（它是专用于二进制文件的存储库）。该存储库大约有 2 Gig，我们不太确定将来是否能够继续这样做。无论如何，对于源代码我认为你不需要担心。

回复收藏 0 原文