是什么让 DVCS 中的合并变得如此简单？

发布于 2024-08-28 22:15:13 字数 2713 浏览 8 评论 0原文

通过分布式版本控制，分布式部分实际上不是最有趣的部分。
有趣的是，这些系统根据变化来思考，而不是就版本而言。

并在 HgInit 处：

当我们必须合并时，Subversion 尝试查看这两个修订版——我的修改后的代码，以及你修改后的代码——它尝试猜测如何将它们粉碎成一个巨大的邪恶混乱。它通常会失败，产生 “合并冲突”的页面和页面这并不是真正的冲突，只是 Subversion 失败的地方弄清楚我们做了什么。
相比之下，当我们工作时分别在 Mercurial 中，Mercurial 是忙于保存一系列变更集。所以，当我们想要合并我们的代码时总之，Mercurial 实际上有一个更多信息：它知道我们每个人改变了什么以及可以做什么重新应用这些更改，而不是只是看看最终的产品试图猜测如何表达一起。

通过查看 SVN 的存储库文件夹，我的印象是 Subversion 将每个修订版本都作为变更集进行维护。据我所知，Hg 同时使用变更集和快照，而Git 纯粹使用快照来存储数据。

如果我的假设是正确的，那么一定有其他方法可以使 DVCS 中的合并变得容易。那些是什么？

*更新：

我对技术角度更感兴趣，但非技术角度的答案是可以接受的
更正：
1. Git 的概念模型纯粹基于快照。快照可以存储为其他快照的差异，只是差异纯粹是为了存储优化。 – Rafał Dowgird 的评论
从非技术角度：
1. 这只是文化问题：如果合并很难，DVCS 根本不起作用，因此 DVCS 开发人员投入大量时间和精力来简化合并。 CVCS 用户 OTOH 已经习惯了糟糕的合并，因此开发人员没有动力让它发挥作用。（当你的用户为垃圾产品支付同样高的价格时，为什么还要做一些好东西呢？）
  ...
  回顾一下：DVCS 的全部要点是拥有许多分散的存储库并不断地来回合并更改。如果没有良好的合并，DVCS 就毫无用处。然而，CVCS 仍然可以在糟糕的合并中生存，特别是如果供应商可以限制其用户避免分支。 – Jörg W Mittag 的答案
从技术角度：
1. 记录历史的真实 DAG 确实有帮助！我认为主要区别在于 CVCS 并不总是将合并记录为多个父级的变更集，从而丢失了一些信息。 – tonfa 的评论
2. 因为合并跟踪，以及更基本的事实，每个修订都知道其父版本。 ...当每个修订（每个提交），包括合并提交，知道其父级（对于合并提交，这意味着拥有/记住多个父级，即合并跟踪）时，您可以重建修订图（DAG = 直接非循环图）历史。如果您知道修订图，您可以找到要合并的提交的共同祖先。当您的 DVCS 知道如何找到共同祖先时，您不需要将其作为参数提供，例如在 CVS 中。
  .
  请注意，两个（或多个）提交可能有多个共同祖先。 Git 使用所谓的“递归”合并策略，该策略合并合并基础（共同祖先），直到留下一个虚拟/有效的共同祖先（在某种简化中），并且可以进行简单的三向合并。 – Jakub Narębski 的答案

也检查一下如何和/或为什么在 Git 中合并比在 SVN 中更好？

原文

I read at Joel on Software:

With distributed version control, the
distributed part is actually not the
most interesting part.
The interesting part is that these
systems think in terms of changes, not
in terms of versions.

and at HgInit:

When we have to merge, Subversion
tries to look at both revisions—my
modified code, and your modified
code—and it tries to guess how to
smash them together in one big unholy
mess. It usually fails, producing
pages and pages of “merge conflicts”
that aren’t really conflicts, simply
places where Subversion failed to
figure out what we did.
By contrast, while we were working
separately in Mercurial, Mercurial was
busy keeping a series of changesets.
And so, when we want to merge our code
together, Mercurial actually has a
whole lot more information: it knows
what each of us changed and can
reapply those changes, rather than
just looking at the final product and
trying to guess how to put it
together.

By looking at the SVN's repository folder, I have the impression that Subversion is maintaining each revisions as changeset. And from what I know, Hg is using both changeset and snapshot while Git is purely using snapshot to store the data.

If my assumption is correct, then there must be other ways that make merging in DVCS easy. What are those?

* Update:

I am more interested in the technical perspective, but answers from non-technical perspective are acceptable
Corrections:
1. Git's conceptual model is purely based on snapshots. The snapshots can be stored as diffs of other snapshots, it's just that the diffs are purely for storage optimization. – Rafał Dowgird's comment
From non-technical perspective:
1. It's simply cultural: a DVCS wouldn't work at all if merging were hard, so DVCS developers invest a lot of time and effort into making merging easy. CVCS users OTOH are used to crappy merging, so there's no incentive for the developers to make it work. (Why make something good when your users pay you equally well for something crap?)
  ...
  To recap: the whole point of a DVCS is to have many decentralized repositories and constantly merge changes back and forth. Without good merging, a DVCS simply is useless. A CVCS however, can still survive with crappy merging, especially if the vendor can condition its users to avoid branching. – Jörg W Mittag's answer
From technical perspective:
1. recording a real DAG of the history does help! I think the main difference is that CVCS didn't always record a merge as a changeset with several parents, losing some information. – tonfa's comment
2. because of merge tracking, and the more fundamental fact that each revisions knows its parents. ... When each revision (each commit), including merge commits, know its parents (for merge commits that means having/remembering more than one parent, i.e. merge tracking), you can reconstruct diagram (DAG = Direct Acyclic Graph) of revision history. If you know graph of revisions, you can find common ancestor of the commits you want to merge. And when your DVCS knows itself how to find common ancestor, you don't need to provide it as an argument, as for example in CVS.
  .
  Note that there might be more than one common ancestor of two (or more) commits. Git makes use of so called "recursive" merge strategy, which merges merge bases (common ancestor), till you are left with one virtual / effective common ancestor (in some simplification), and can the do simple 3-way merge. – Jakub Narębski's answer

Check as well How and/or why is merging in Git better than in SVN?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

末蓝 2024-09-04 22:15:13

DVCS 中没有什么特别之处可以让合并变得更容易。这只是文化问题：如果合并很难，DVCS 就根本不起作用，因此 DVCS 开发人员投入大量时间和精力来使合并变得容易。 CVCS 用户 OTOH 已经习惯了糟糕的合并，因此开发人员没有动力让它发挥作用。（当你的用户为垃圾产品支付同样高的价格时，为什么要做一些好东西？）

Linus Torvalds 在他的一次 Git 演讲中说，当他在 Transmeta 使用 CVS 时，他们在一个项目中留出了整整一周合并的开发周期。每个人都接受这是正常情况。如今，在合并窗口期间，Linus 在短短几个小时内进行了数百次合并。

如果 CVCS 用户只是去找他们的供应商并说这种垃圾是不可接受的，那么 CVCS 可以具有与 DVCS 一样好的合并功能。但他们陷入了 Blub 悖论：他们根本不知道这是不可接受的，因为他们从未见过有效的合并系统。他们不知道外面还有更好的东西。

当他们尝试 DVCS 时，他们神奇地将所有优点都归功于“D”部分。

理论上，由于集中式的性质，CVCS 应该具有更好的合并功能，因为它们具有整个历史的全局视图，这与 DVCS 不同，每个存储库只有很小的一部分分段。

回顾一下：DVCS 的要点是拥有许多分散的存储库并不断地来回合并更改。如果没有良好的合并，DVCS 就毫无用处。然而，CVCS 仍然可以在糟糕的合并中生存，特别是如果供应商可以限制其用户避免分支。

因此，就像软件工程中的其他所有事情一样，这是一个努力的问题。

回复收藏 0 原文

爺獨霸怡葒院 2024-09-04 22:15:13

在 Git 和其他 DVCS 中，合并很容易，并不是因为一些神秘的一系列变更集视图（除非您使用 Darcs 及其补丁理论，或一些受 Darcs 启发的 DVCS；不过，它们只是少数）这是 Joel 闲聊的原因，但因为合并跟踪，以及更基本的事实，每个修订都知道其父版本。为此，您需要（我认为）整个树/完整存储库提交...不幸的是，这限制了进行部分签出以及仅对文件子集进行提交的能力。

当每个修订（每个提交），包括合并提交，知道其父级（对于合并提交，这意味着拥有/记住多个父级，即合并跟踪）时，您可以重建图（DAG = 直接非循环）图）修订历史。如果您知道修订图，您可以找到要合并的提交的共同祖先。当您的 DVCS 知道如何找到共同祖先时，您不需要将其作为参数提供，例如在 CVS 中。

请注意，两个（或多个）提交可能有多个共同祖先。 Git 使用所谓的“递归”合并策略，该策略合并合并基础（共同祖先），直到留下一个虚拟/有效的共同祖先（在某种简化中），并且可以进行简单的三向合并。

Git 使用重命名检测是为了能够处理涉及文件重命名的合并。（这支持 Jörg W Mittag 的论点，即 DVCS 具有更好的合并能力支持，因为他们必须拥有它，因为合并比 CVCS 更常见，其合并隐藏在“update”命令中，在 update-then-commit 工作流程中，参见了解版本控制（WIP）作者：Eric S. Raymond）。

回复收藏 0 原文

浮华 2024-09-04 22:15:13

部分原因当然是技术论点，即 DVCS 比 SVN 存储更多信息（DAG、副本），并且还具有更简单的内部模型，这就是为什么它能够执行更准确的合并，如其他响应中提到的。

然而，可能更重要的区别是，因为您有本地存储库，所以您可以进行频繁的小提交，并且还可以频繁地拉取和合并传入的更改。这更多是由“人为因素”造成的，即人类使用集中式 VCS 与 DVCS 的工作方式的差异。

使用 SVN，如果您更新并存在冲突，SVN 将合并它可以合并的内容，并在代码中无法合并的地方插入标记。最大的问题是，在解决所有冲突之前，您的代码现在将不再处于可用状态。

这会分散您对想要完成的工作的注意力，因此 SVN 用户通常不会在处理任务时进行合并。再加上 SVN 用户还倾向于让更改累积在一次大型提交中，因为担心破坏其他人的工作副本，并且分支和合并之间会有很长一段时间。

借助 Mercurial，您可以在较小的增量提交之间更频繁地合并传入更改。根据定义，这将减少合并冲突，因为您将使用更新的代码库。

如果结果出现冲突，您可以决定推迟合并并在您自己空闲时进行。这尤其使得合并不再那么烦人。

回复收藏 0 原文

夜声 2024-09-04 22:15:13

哇哦，五段文章的攻击！

简而言之，没有什么是容易的。这很难，而且我的经验表明错误确实会发生。但是：

DVCS 迫使您处理合并问题，这意味着需要花几分钟时间熟悉现有的工具来帮助您解决问题。仅此一点就有帮助。
DVCS 鼓励您频繁合并，这也有帮助。

您引用的 hginit 片段声称 Subversion 无法进行三向合并，而 Mercurial 通过查看两个分支中的所有变更集进行合并，这在两个方面都是错误的。

回复收藏 0 原文

日记撕了你也走了 2024-09-04 22:15:13

一点是 svn 合并被巧妙地破坏了；请参阅 http://blogs.open.collab.net/svn /2008/07/subversion-merg.html 我怀疑这与 svn 记录合并信息相结合，甚至在挑选合并时也是如此。在处理边界情况时添加一些简单的错误，并且 svn 作为 CVCS 的当前典型子项，使它们看起来很糟糕，而不是所有刚刚得到正确结果的 DVCS。

回复收藏 0 原文

巨坚强 2024-09-04 22:15:13

我认为变更集的 DAG，正如其他人提到的，有很大的不同。 DVCS:es 需要在基本层面上拆分历史（和合并），而我认为 CVCS:es（较旧）从第一天开始构建，首先跟踪修订和文件，然后添加合并支持。

因此：

当标签/分支与源目录树分开跟踪时，合并很容易进行和跟踪，因此可以一次性合并整个存储库。
由于 DVCS:es 具有本地存储库，因此很容易创建这些存储库，因此事实证明，可以轻松地将不同的模块保留在不同的存储库中，而不用在大型存储库中跟踪它们。（因此，存储库范围内的合并不会导致与 svn/cvs 中相同的中断，其中一个存储库通常包含许多不相关的模块，这些模块需要有单独的合并历史记录。）
CVS/SVN 允许工作目录中的不同文件来自不同的修订版，而 DVCS:es 通常对整个 WC 有一个修订版（即，即使文件恢复到早期版本，它也会在状态中显示为已修改，因为它与签出的文件不同） SVN/CVS 并不总是显示这一点。）

我认为，混合这些概念（就像 Subversion 那样）是一个很大的错误。例如，源树内部有分支/标签，因此您必须跟踪文件的哪些修订版本已合并到其他文件。这显然比仅仅跟踪哪些修订已被合并要复杂得多。

因此，总结一下：

DVCS：需要轻松合并，并在此基础上设置其功能。做出的设计决策是为了使这些合并易于执行和跟踪（通过 DAG），并实现其他功能（分支/标签/子模块）来适应这一点，而不是相反。
CVCS:es 从一开始就具有一些功能（例如模块），这些功能使某些事情变得简单，但使存储库范围内的合并实现起来非常棘手。

至少这是我从我使用 cvs、svn、git 和 hg 的经验中感受到的。（可能还有其他 CVCS:es 也已经解决了这个问题。）

I think the DAG of changesets, as mentioned by others, makes a big difference. DVCS:es require split history (and merges) at a fundamental level, whereas I suppose CVCS:es (which are older) where built from day 1 to track revisions and files first, with merge support being added as an afterthought.

So:

Merging is easy to do and track in when tags/branches are tracked separately from the directory tree of sources, so the entire repo can be merged in one go.
Since DVCS:es have local repos, these are easy to create, so it's turns out it's easy to keep different modules in different repos instead of tracking them all inside a big repo. (so repo-wide merges don't cause the same disruptions as they would be in svn/cvs where one repo often contains many unrelated modules which need to have separate merge histories.)
CVS/SVN allows different files in the working directory to come from different revisions, while DVCS:es usually have one revision for the entire WC, always (i.e. even if a file is be reverted to an earlier version, it will show as modified in status as it is different from the file in the checked out revision. SVN/CVS does not show this always.)

Mixing these concepts (as Subversion does) is, I belive, a big mistake. For instance, has branches/tags inside the source tree, so there you have to track which revisions of files have been merged to other files. This is clearly more complex than just tracking which revisions have been merged.

So, summarizing:

DVCS:es need easy merges, have have their feature set based on that. Design decision where made so that these merges are easy to do and track (via DAG), and other features (branches/tags/submodules) are implemented to suit that, not the other way around.
CVCS:es had some features from the start (such as modules) that made some things easy, but make repo-wide merges very tricky to implement.

At least this is what I feel from my experience with cvs, svn, git and hg. (There probably are other CVCS:es which has got this thing right too.)

回复收藏 0 原文