Subversion 中可能存在版本缺失的情况(研究环境)
我的第一个问题在这里,我没有在其他地方看到它得到解决。我在一家研究机构工作,因此我们希望能够说出哪个代码版本产生了一组特定的结果。我的问题是我的分析是否正确,如下图所示(请注意,版本节点 ID 仅用于说明目的,与 SVN、Git 或 Hg 中的实际版本 ID 不对应。带有字母的版本号表示未提交的代码状态在 SVN 中,整数版本 ID 代表 SVN 中的已提交状态,Git/Hg 框中的所有版本 ID 代表已提交的代码状态):
示例场景:
假设有两个工作副本“A”和“B”从修订版 1 开始。
“A”修改函数
foo()
中的默认值,生成结果,并且签入版本(repo ver2)。“B”不会修改
foo()
,而是修改代码的其他部分,使用旧的默认值生成结果,并尝试签入所使用的版本 1b。它会失败,因为需要更新,但在合并版本 2 和 1b 的过程中,SVN 将丢失版本 1b 在 foo() 中使用不同默认值的事实。这不会被检测为冲突,因为“A”和“B”没有更改代码的相同部分。版本 3 与版本 1b 不同,因此不能保证可复制性。
我无法使用 TortoiseSVN 在本地驱动器中模拟此场景(由于 SVN Checkout 错误,我无法创建工作副本 —“无法打开 URL 的 ra_local 会话”)。我确实知道,Git 和 Hg 都会正确处理这种情况,并在历史记录中显示版本 1b(如果已提交且未使用变基功能)。 (我相信rebase本质上是SVN中不涉及分支时的正常行为。)
这个分析正确吗?
My first question here and I have not seen it addressed elsewhere. I work in a research institute so we'd like to be able to say which code version produced a particular set of results. My question is whether my analysis is correct as shown in the illustration below (Note that the version node id's are for illustration purposes only and do not correspond to actual version id's in SVN, Git, or Hg. Version numbers with letters represent uncommitted code state in SVN, whole number version id's represent committed state in SVN, all version IDs in Git/Hg box represent committed code state):
Example scenario:
Suppose there are two working copies "A" and "B" that start from revision 1.
"A" revises the default values in function
foo()
, generates results, and checks-in the version (repo ver2)."B" does not revise
foo()
but some other part of the code, uses the old default values to generate results, and attempts to check-in the as-used version 1b. It fails because an update is needed, but in the process of merging version 2 and 1b, SVN will lose the fact that version 1b used different default values infoo()
. This is not detected as a conflict since "A" and "B" did not change the same part of the code. Version 3 is not identical to Version 1b, so replicability is not guaranteed.
I cannot simulate this scenario in my local drive using TortoiseSVN (I cannot create working copies because of SVN Checkout error — "Unable to open an ra_local session to URL"). I do know for a fact that both Git and Hg will handle the situation properly and show version 1b in the history if it was committed and if the rebase feature was not used. (I believe rebase is essentially the normal behavior in SVN when no branches are involved.)
Is this analysis correct?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尽管我反对“版本 1b”命名,但您的分析原则上是正确的。版本 1b 永远不会存在于 SVN 领域中,因为 1b 是提交之前工作目录的状态。
您的工作流程存在一个根本问题:当您想要可靠地识别结果时,您首先必须获取标识符,然后生成结果。签入,然后生成。如果这会带来可靠性问题,请签入分支,生成结果,然后合并。分支合并方法类似于 git 或 hg 等分布式 VCS 软件的工作方式,其中本地存储库是隐式分支,推送是隐式合并。
Your analysis is correct in principle, even though I would object to the "version 1b" naming. Version 1b does never exist in the SVN realm, because 1b is the state of a working directory before committing.
Your workflow has a fundamental problem: When you want reliable identification of results, you'll first have to acquire an identifier and then produce results. Check-in, then generate. If this gives reliability problems, check-in to a branch, generate results and then merge. The branch-and-merge approach is similar to the way distributed VCS software like git or hg works, where the local repositories are implicit branches and the push is an implicit merge.
是的,你是对的,Subversion 存在这个问题。事实上,它比你想象的更糟糕。当 Subversion 确定您的工作副本是否已过期时,它会按每个文件运行。因此,您最终可能会得到
A revises default values in
foo()
,并重新运行实验。假设更改仅影响results/output-0001.dat
。A 将其提交为 SVN 修订版 2。
B 修改了代码的另一部分并生成新结果。由于 B 没有 A 的更改,因此重新运行时仅更改了
results/output-1000.dat
。B 将此作为 SVN 修订版 3 提交。
可以先不更新提交,因为他所做的更改与 A 所做的更改不相交此外,SVN 修订版 3 与 A 或 B 机器上的工作副本都不对应!如果 C 教授检查了 SVN 修订版 3,那么他会看到:
results/output-0001.dat
以及 A 的结果,以及results/output-1000.dat< /code> 与 B 的结果
非常不一致。
允许这样做的基本概念是 混合修订工作副本。 Subversion 允许您在工作副本中拥有不同版本的文件。当您创建修订版 2 并对
foo.c
进行更改时,该文件将被标记为修订版 2。工作副本中的其他文件仍保留在修订版 1 中。这使您可以有选择地更新部分文件将您的工作副本恢复为旧版本以进行调试,并且只要没有其他人触及该文件,它就允许您提交文件而无需更新。Mercurial 和 Git 等工具会阻止您执行此操作,因为它们将历史记录建模为 DAG(有向无环图)。每个更改都会成为图中的一个新节点,您必须进行显式合并提交才能组合两个更改集。在上面的场景中,B 会尝试推动他的更改,而 Mercurial 将中止。然后他将
结果的所有三个版本现在都存储在历史记录中。
Yes, you are correct that Subversion has this problem. Infact, it's even worse than you think. Subversion operates on a per-file basis when it determines if your working copy is out of date. So you can end up with
A revises default values in
foo()
, and re-runs the experiment. Let's say the change only affect theresults/output-0001.dat
.A commits this as SVN revision 2.
B revises another part of the code and generates new results. Since B doesn't have the change from A, only
results/output-1000.dat
is changed by the rerun.B commits this as SVN revision 3.
B could commit without updating first since the changes he made did not intersect with the changes made by A. Further more, SVN revision 3 does not correspond to the working copy on either A's or B's machine! If professor C comes along and makes a checkout of SVN revision 3, then he sees:
results/output-0001.dat
with the results from A, andresults/output-1000.dat
with the results from B.This is highly inconsistent.
The underlying concept that allows this is mixed-revision working copies. Subversion allows you to have files in different revisions in your working copy. When you create revision 2 with a change to
foo.c
, then that file is marked as being in revision 2. The other files in the working copy remain in revision 1. This allows you to selectively update part of your working copy back to an old revision for debugging purposes, and it allows you to commit files without updating as long as no-one else has touched the file.Tools like Mercurial and Git will prevent you from doing this since they model the history as a DAG (directed acyclic graph). Each change becomes a new node in the graph and you must made an explicit merge commit to combine two changesets. In the scenario above, B would try to push his change and Mercurial would abort. He then does
All three versions of the results are now stored in the history.