是什么让 DVCS 中的合并变得如此简单?
我在 Joel on Software 上读到:
通过分布式版本控制, 分布式部分实际上不是 最有趣的部分。
有趣的是,这些 系统根据变化来思考,而不是 就版本而言。
并在 HgInit 处:
当我们必须合并时,Subversion 尝试查看这两个修订版——我的 修改后的代码,以及你修改后的 代码——它尝试猜测如何 将它们粉碎成一个巨大的邪恶 混乱。它通常会失败,产生 “合并冲突”的页面和页面 这并不是真正的冲突,只是 Subversion 失败的地方 弄清楚我们做了什么。
相比之下,当我们工作时 分别在 Mercurial 中,Mercurial 是 忙于保存一系列变更集。 所以,当我们想要合并我们的代码时 总之,Mercurial 实际上有一个 更多信息:它知道 我们每个人改变了什么以及可以做什么 重新应用这些更改,而不是 只是看看最终的产品 试图猜测如何表达 一起。
通过查看 SVN 的存储库文件夹,我的印象是 Subversion 将每个修订版本都作为变更集进行维护。据我所知,Hg 同时使用变更集和快照,而Git 纯粹使用快照来存储数据。
如果我的假设是正确的,那么一定有其他方法可以使 DVCS 中的合并变得容易。那些是什么?
*更新:
- 我对技术角度更感兴趣,但非技术角度的答案是可以接受的
- 更正:
- Git 的概念模型纯粹基于快照。快照可以存储为其他快照的差异,只是差异纯粹是为了存储优化。 – Rafał Dowgird 的 评论
- 从非技术角度:
- 这只是文化问题:如果合并很难,DVCS 根本不起作用,因此 DVCS 开发人员投入大量时间和精力来简化合并。 CVCS 用户 OTOH 已经习惯了糟糕的合并,因此开发人员没有动力让它发挥作用。 (当你的用户为垃圾产品支付同样高的价格时,为什么还要做一些好东西呢?)
...
回顾一下:DVCS 的全部要点是拥有许多分散的存储库并不断地来回合并更改。如果没有良好的合并,DVCS 就毫无用处。然而,CVCS 仍然可以在糟糕的合并中生存,特别是如果供应商可以限制其用户避免分支。 – Jörg W Mittag 的 答案
- 这只是文化问题:如果合并很难,DVCS 根本不起作用,因此 DVCS 开发人员投入大量时间和精力来简化合并。 CVCS 用户 OTOH 已经习惯了糟糕的合并,因此开发人员没有动力让它发挥作用。 (当你的用户为垃圾产品支付同样高的价格时,为什么还要做一些好东西呢?)
- 从技术角度:
- 记录历史的真实 DAG 确实有帮助!我认为主要区别在于 CVCS 并不总是将合并记录为多个父级的变更集,从而丢失了一些信息。 – tonfa 的 评论
- 因为合并跟踪,以及更基本的事实,每个修订都知道其父版本。 ...当每个修订(每个提交),包括合并提交,知道其父级(对于合并提交,这意味着拥有/记住多个父级,即合并跟踪)时,您可以重建修订图(DAG = 直接非循环图)历史。如果您知道修订图,您可以找到要合并的提交的共同祖先。当您的 DVCS 知道如何找到共同祖先时,您不需要将其作为参数提供,例如在 CVS 中。
.
请注意,两个(或多个)提交可能有多个共同祖先。 Git 使用所谓的“递归”合并策略,该策略合并合并基础(共同祖先),直到留下一个虚拟/有效的共同祖先(在某种简化中),并且可以进行简单的三向合并。 – Jakub Narębski 的 答案
I read at Joel on Software:
With distributed version control, the
distributed part is actually not the
most interesting part.The interesting part is that these
systems think in terms of changes, not
in terms of versions.
and at HgInit:
When we have to merge, Subversion
tries to look at both revisions—my
modified code, and your modified
code—and it tries to guess how to
smash them together in one big unholy
mess. It usually fails, producing
pages and pages of “merge conflicts”
that aren’t really conflicts, simply
places where Subversion failed to
figure out what we did.By contrast, while we were working
separately in Mercurial, Mercurial was
busy keeping a series of changesets.
And so, when we want to merge our code
together, Mercurial actually has a
whole lot more information: it knows
what each of us changed and can
reapply those changes, rather than
just looking at the final product and
trying to guess how to put it
together.
By looking at the SVN's repository folder, I have the impression that Subversion is maintaining each revisions as changeset. And from what I know, Hg is using both changeset and snapshot while Git is purely using snapshot to store the data.
If my assumption is correct, then there must be other ways that make merging in DVCS easy. What are those?
* Update:
- I am more interested in the technical perspective, but answers from non-technical perspective are acceptable
- Corrections:
- Git's conceptual model is purely based on snapshots. The snapshots can be stored as diffs of other snapshots, it's just that the diffs are purely for storage optimization. – Rafał Dowgird's comment
- From non-technical perspective:
- It's simply cultural: a DVCS wouldn't work at all if merging were hard, so DVCS developers invest a lot of time and effort into making merging easy. CVCS users OTOH are used to crappy merging, so there's no incentive for the developers to make it work. (Why make something good when your users pay you equally well for something crap?)
...
To recap: the whole point of a DVCS is to have many decentralized repositories and constantly merge changes back and forth. Without good merging, a DVCS simply is useless. A CVCS however, can still survive with crappy merging, especially if the vendor can condition its users to avoid branching. – Jörg W Mittag's answer
- It's simply cultural: a DVCS wouldn't work at all if merging were hard, so DVCS developers invest a lot of time and effort into making merging easy. CVCS users OTOH are used to crappy merging, so there's no incentive for the developers to make it work. (Why make something good when your users pay you equally well for something crap?)
- From technical perspective:
- recording a real DAG of the history does help! I think the main difference is that CVCS didn't always record a merge as a changeset with several parents, losing some information. – tonfa's comment
- because of merge tracking, and the more fundamental fact that each revisions knows its parents. ... When each revision (each commit), including merge commits, know its parents (for merge commits that means having/remembering more than one parent, i.e. merge tracking), you can reconstruct diagram (DAG = Direct Acyclic Graph) of revision history. If you know graph of revisions, you can find common ancestor of the commits you want to merge. And when your DVCS knows itself how to find common ancestor, you don't need to provide it as an argument, as for example in CVS.
.
Note that there might be more than one common ancestor of two (or more) commits. Git makes use of so called "recursive" merge strategy, which merges merge bases (common ancestor), till you are left with one virtual / effective common ancestor (in some simplification), and can the do simple 3-way merge. – Jakub Narębski's answer
Check as well How and/or why is merging in Git better than in SVN?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
DVCS 中没有什么特别之处可以让合并变得更容易。这只是文化问题:如果合并很难,DVCS 就根本不起作用,因此 DVCS 开发人员投入大量时间和精力来使合并变得容易。 CVCS 用户 OTOH 已经习惯了糟糕的合并,因此开发人员没有动力让它发挥作用。 (当你的用户为垃圾产品支付同样高的价格时,为什么要做一些好东西?)
Linus Torvalds 在他的一次 Git 演讲中说,当他在 Transmeta 使用 CVS 时,他们在一个项目中留出了整整一周合并的开发周期。每个人都接受这是正常情况。如今,在合并窗口期间,Linus 在短短几个小时内进行了数百次合并。
如果 CVCS 用户只是去找他们的供应商并说这种垃圾是不可接受的,那么 CVCS 可以具有与 DVCS 一样好的合并功能。但他们陷入了 Blub 悖论:他们根本不知道这是不可接受的,因为他们从未见过有效的合并系统。他们不知道外面还有更好的东西。
当他们尝试 DVCS 时,他们神奇地将所有优点都归功于“D”部分。
理论上,由于集中式的性质,CVCS 应该具有更好的合并功能,因为它们具有整个历史的全局视图,这与 DVCS 不同,每个存储库只有很小的一部分分段。
回顾一下:DVCS 的要点是拥有许多分散的存储库并不断地来回合并更改。如果没有良好的合并,DVCS 就毫无用处。然而,CVCS 仍然可以在糟糕的合并中生存,特别是如果供应商可以限制其用户避免分支。
因此,就像软件工程中的其他所有事情一样,这是一个努力的问题。
There's nothing in particular in DVCSs that makes merging easier. It's simply cultural: a DVCS wouldn't work at all if merging were hard, so DVCS developers invest a lot of time and effort into making merging easy. CVCS users OTOH are used to crappy merging, so there's no incentive for the developers to make it work. (Why make something good when your users pay you equally well for something crap?)
Linus Torvalds said in one of his Git talks that when he was using CVS at Transmeta, they set aside an entire week during a development cycle for merging. And everybody just accepted this as the normal state of affairs. Nowadays, during a merge window, Linus does hundreds of merges within just a few hours.
CVCSs could have just as good merging capabilities as DVCSs, if CVCS users simply went to their vendors and said that this crap is unacceptable. But they are caught in the Blub paradox: they simply don't know that it is unacceptable, because they have never seen a working merge system. They don't know that there is something better out there.
And when they do try out a DVCS, they magically attribute all the goodness to the "D" part.
Theoretically, due to the centralized nature, a CVCS should have better merge capabilities, because they have a global view of the entire history, unlike DVCS were every repository only has a tiny fragment.
To recap: the whole point of a DVCS is to have many decentralized repositories and constantly merge changes back and forth. Without good merging, a DVCS simply is useless. A CVCS however, can still survive with crappy merging, especially if the vendor can condition its users to avoid branching.
So, just like with everything else in software engineering, it's a matter of effort.
在 Git 和其他 DVCS 中,合并很容易,并不是因为一些神秘的一系列变更集视图(除非您使用 Darcs 及其补丁理论,或一些受 Darcs 启发的 DVCS;不过,它们只是少数)这是 Joel 闲聊的原因,但因为合并跟踪,以及更基本的事实,每个修订都知道其父版本。为此,您需要(我认为)整个树/完整存储库提交...不幸的是,这限制了进行部分签出以及仅对文件子集进行提交的能力。
当每个修订(每个提交),包括合并提交,知道其父级(对于合并提交,这意味着拥有/记住多个父级,即合并跟踪)时,您可以重建图(DAG = 直接非循环)图)修订历史。如果您知道修订图,您可以找到要合并的提交的共同祖先。当您的 DVCS 知道如何找到共同祖先时,您不需要将其作为参数提供,例如在 CVS 中。
请注意,两个(或多个)提交可能有多个共同祖先。 Git 使用所谓的“递归”合并策略,该策略合并合并基础(共同祖先),直到留下一个虚拟/有效的共同祖先(在某种简化中),并且可以进行简单的三向合并。
Git 使用重命名检测是为了能够处理涉及文件重命名的合并。 (这支持 Jörg W Mittag 的论点,即 DVCS 具有更好的合并能力支持,因为他们必须拥有它,因为合并比 CVCS 更常见,其合并隐藏在“update”命令中,在 update-then-commit 工作流程中,参见 了解版本控制(WIP)作者:Eric S. Raymond) 。
In Git and other DVCS merges are easy not because of some mystical series of changesets view (unless you are using Darcs, with its theory of patches, or some Darcs-inspired DVCS; they are minority, though) that Joel rambles about, but because of merge tracking, and the more fundamental fact that each revisions knows its parents. For that you need (I think) whole-tree / full-repository commits... which unfortunately limits ability to do partial checkouts, and making a commit about only subset of files.
When each revision (each commit), including merge commits, know its parents (for merge commits that means having/remembering more than one parent, i.e. merge tracking), you can reconstruct diagram (DAG = Direct Acyclic Graph) of revision history. If you know graph of revisions, you can find common ancestor of the commits you want to merge. And when your DVCS knows itself how to find common ancestor, you don't need to provide it as an argument, as for example in CVS.
Note that there might be more than one common ancestor of two (or more) commits. Git makes use of so called "recursive" merge strategy, which merges merge bases (common ancestor), till you are left with one virtual / effective common ancestor (in some simplification), and can the do simple 3-way merge.
Git use of rename detection was created to be able to deal with merges involving file renames. (This supports Jörg W Mittag argument that DVCS have better merge support because they had to have it, as merges are much more common than in CVCS with its merge hidden in 'update' command, in update-then-commit workflow, c.f. Understanding Version Control (WIP) by Eric S. Raymond).
部分原因当然是技术论点,即 DVCS 比 SVN 存储更多信息(DAG、副本),并且还具有更简单的内部模型,这就是为什么它能够执行更准确的合并,如其他响应中提到的。
然而,可能更重要的区别是,因为您有本地存储库,所以您可以进行频繁的小提交,并且还可以频繁地拉取和合并传入的更改。这更多是由“人为因素”造成的,即人类使用集中式 VCS 与 DVCS 的工作方式的差异。
使用 SVN,如果您更新并存在冲突,SVN 将合并它可以合并的内容,并在代码中无法合并的地方插入标记。最大的问题是,在解决所有冲突之前,您的代码现在将不再处于可用状态。
这会分散您对想要完成的工作的注意力,因此 SVN 用户通常不会在处理任务时进行合并。再加上 SVN 用户还倾向于让更改累积在一次大型提交中,因为担心破坏其他人的工作副本,并且分支和合并之间会有很长一段时间。
借助 Mercurial,您可以在较小的增量提交之间更频繁地合并传入更改。根据定义,这将减少合并冲突,因为您将使用更新的代码库。
如果结果出现冲突,您可以决定推迟合并并在您自己空闲时进行。这尤其使得合并不再那么烦人。
Part of the reason is of course the technical argument that DVCSes store more information than SVN does (DAG, copies), and also have a simpler internal model, which is why it is able to perform more accurate merges, as mentioned in the other responses.
However probably an even more important difference is that because you have a local repository, you can make frequent, small commits, and also frequently pull and merge incoming changes. This is caused more by the ‘human factor’, the differences in the way a human works with a centralised VCS versus a DVCS.
With SVN, if you update and there are conflicts, SVN will merge what it can and insert markers in your code where it can’t. Big big problem with this is that your code will now no longer be in a workable state until you resolve all the conflicts.
This distracts you from the work you are trying to achieve, so typically SVN users do not merge while they are working on a task. Combine this with the fact that SVN users also tend to let changes accumulate in a single large commit for the fear of breaking other people’s working copies, and there will be large periods of time between the branch and the merge.
With Mercurial, you can merge with incoming changes much more frequently inbetween your smaller incremental commits. This will by definition result in less merge conflicts, because you will be working on a more up-to-date codebase.
And if there turns out to be a conflict, you can decide to postpone the merge and do it at your own leisure. This in particular makes the merging so much less annoying.
哇哦,五段文章的攻击!
简而言之,没有什么是容易的。这很难,而且我的经验表明错误确实会发生。但是:
DVCS 迫使您处理合并问题,这意味着需要花几分钟时间熟悉现有的工具来帮助您解决问题。仅此一点就有帮助。
DVCS 鼓励您频繁合并,这也有帮助。
您引用的 hginit 片段声称 Subversion 无法进行三向合并,而 Mercurial 通过查看两个分支中的所有变更集进行合并,这在两个方面都是错误的。
Whoa, attack of the 5-paragraph essays!
In short, nothing makes it easy. It is hard, and my experience indicates that errors do occur. But:
DVCS forces you to deal with merging, which means taking a few minutes to familiarize yourself with the tools that exist to help you out. That alone helps.
DVCS encourages you to merge frequently, which helps too.
The snippet of hginit that you quoted, claiming that Subversion is unable to do three-way merges and that Mercurial merges by looking at all the changesets in both branches, is simply wrong on both counts.
一点是 svn 合并被巧妙地破坏了;请参阅 http://blogs.open.collab.net/svn /2008/07/subversion-merg.html 我怀疑这与 svn 记录合并信息相结合,甚至在挑选合并时也是如此。在处理边界情况时添加一些简单的错误,并且 svn 作为 CVCS 的当前典型子项,使它们看起来很糟糕,而不是所有刚刚得到正确结果的 DVCS。
One point is that svn merging is subtly broken; see http://blogs.open.collab.net/svn/2008/07/subversion-merg.html I suspect this is in conjunction with svn recording mergeinfo even on cherry-picking merges. Add a few plain bugs in handling border cases, and svn as the current poster child of CVCS makes them look bad as opposed to all the DVCS which just got it right.
我认为变更集的 DAG,正如其他人提到的,有很大的不同。 DVCS:es 需要在基本层面上拆分历史(和合并),而我认为 CVCS:es(较旧)从第一天开始构建,首先跟踪修订和文件,然后添加合并支持。
因此:
我认为,混合这些概念(就像 Subversion 那样)是一个很大的错误。例如,源树内部有分支/标签,因此您必须跟踪文件的哪些修订版本已合并到其他文件。这显然比仅仅跟踪哪些修订已被合并要复杂得多。
因此,总结一下:
至少这是我从我使用 cvs、svn、git 和 hg 的经验中感受到的。 (可能还有其他 CVCS:es 也已经解决了这个问题。)
I think the DAG of changesets, as mentioned by others, makes a big difference. DVCS:es require split history (and merges) at a fundamental level, whereas I suppose CVCS:es (which are older) where built from day 1 to track revisions and files first, with merge support being added as an afterthought.
So:
Mixing these concepts (as Subversion does) is, I belive, a big mistake. For instance, has branches/tags inside the source tree, so there you have to track which revisions of files have been merged to other files. This is clearly more complex than just tracking which revisions have been merged.
So, summarizing:
At least this is what I feel from my experience with cvs, svn, git and hg. (There probably are other CVCS:es which has got this thing right too.)
我发现使用 DVCS 更容易的一件事是,每个开发人员都可以将自己的更改合并到他们想要的任何存储库中。当您合并自己的代码时,处理合并冲突要容易得多。我曾在一些地方工作过,一些可怜的人通过找到每个参与的开发人员来解决合并冲突。
此外,使用 DVCS,您还可以执行以下操作:克隆存储库、将两个开发人员的工作合并到克隆中、测试更改,然后从克隆合并回主存储库。
非常酷的东西。
One thing I find easier with DVCS is that each developer can merge their own changes into which ever repository that they desire. It's much easier to handle merge conflicts when you're merging your own code. I've worked in places where some poor soul had fixed merge conflicts by finding each developer involved.
Also with a DVCS you can do things like clone a repository, merge work from two developers into the clone, test the changes, then merge from the clone back into the main repository.
Pretty cool stuff.
作为历史记录,现在古老的 PRCS 系统也知道共同的祖先并且可以有效地合并,尽管它不是未分发(它是建立在 RCS 文件之上的!)。然而,这意味着它可以有效地迁移到 git,同时保留历史记录。
As a historical note, the now-archaic PRCS system also knows about common ancestors and can merge efficiently, though it wasn't distributed (it was built on top of RCS files!). It meant that it could be effectively migrated to git while retain history, however.
DVCS 用户可能永远不会做那些使合并变得困难的事情,例如更改和重命名/复制项目中大多数文件的重构,或者从数百个文件中使用的 stratch API 进行重新设计。
May be DVCS users just never do things that make merging hard like refactorings that change and rename/copy most files in the project, or redesigning from stratch APIs that are used in the hundrends of files.