使用 Mercurial 执行历史构建
背景
我们使用中央存储库模型来协调团队中所有开发人员之间的代码提交。我们的自动化夜间构建系统的代码提交截止时间为每天凌晨 3 点,此时它将最新代码从中央存储库提取到自己的本地存储库。
几周前,执行了一次构建,其中包括存储库的修订版 1。当时,构建系统没有以任何方式跟踪用于执行构建的存储库的修订(值得庆幸的是,现在它可以跟踪)。
-+------- Build Cut-Off Time
|
|
O Revision 1
在构建截止时间前一小时,开发人员对存储库进行了分支,并在自己的本地副本中提交了新的修订版本。他们在截止之前没有将其推回到中央存储库,因此它没有包含在构建中。这将是下图中的修订版 2。
-+------- Build Cut-Off Time
|
| O Revision 2
| |
| |
|/
|
O Revision 1
构建完成一小时后,开发人员将他们的更改推送回中央存储库。
O Revision 3
|\
| |
-+-+----- Build Cut-Off Time
| |
| O Revision 2
| |
| |
|/
|
O Revision 1
因此,修订版 1 已纳入构建,而修订版 2 中的更改将包含在第二天早上的构建中(作为修订版 3 的一部分)。到目前为止,一切都很好。
问题
现在,今天,我想重建原始版本。执行此操作的看似明显的步骤是
- 确定原始构建中的修订版、
- 更新到该修订版并
- 执行构建。
问题出现在步骤 1 中。在没有单独记录存储库修订版本的情况下,我如何才能明确确定原始构建中使用了存储库的哪个修订版本?所有修订都位于同一命名分支上,并且不使用任何标签。
log
命令
hg log --date "<cutoff_of_original_build" --limit 1
提供修订版 2 - 而不是原始版本中的修订版 1!
现在,我明白为什么要这样做 - 修订版 2 现在是最接近构建截止时间的修订版 - 但这并没有改变我未能识别正确修订版的事实要重建的。
因此,如果我无法使用 log
命令的 --date
选项来查找正确的历史版本,那么还有什么其他方法可以确定正确的版本呢?
Background
We use a central repository model to coordinate code submissions between all the developers on my team. Our automated nightly build system has a code submission cut-off of 3AM each morning, when it pulls the latest code from the central repo to its own local repository.
Some weeks ago, a build was performed that included Revision 1 of the repo. At that time, the build system did not in any way track the revision of the repository that was used to perform the build (it does now, thankfully).
-+------- Build Cut-Off Time
|
|
O Revision 1
An hour before the build cut-off time, a developer branched off the repository and committed a new revision in their own local copy. They did NOT push it back to the central repo before the cut-off and so it was not included in the build. This would be Revision 2 in the graph below.
-+------- Build Cut-Off Time
|
| O Revision 2
| |
| |
|/
|
O Revision 1
An hour after the build, the developer pushed their changes back to the central repo.
O Revision 3
|\
| |
-+-+----- Build Cut-Off Time
| |
| O Revision 2
| |
| |
|/
|
O Revision 1
So, Revision 1 made it into the build, while the changes in Revision 2 would've been included in the following morning's build (as part of Revision 3). So far, so good.
Problem
Now, today, I want to reconstruct the original build. The seemingly obvious steps to do this would be to
- determine the revision that was in the original build,
- update to that revision, and
- perform the build.
The problem comes with Step 1. In the absence of a separately recorded repository revision, how can I definitively determine what revision of the repo was used in the original build? All revisions are on the same named branch and no tags are used.
The log
command
hg log --date "<cutoff_of_original_build" --limit 1
gives Revision 2 - not Revision 1, which was in the original build!
Now, I understand why it does this - Revision 2 is now the revision closest to the build cut-off time - but it doesn't change the fact that I've failed to identify the correct revision on which to rebuild.
Thus, if I can't use the --date
option of the log
command to find the correct historical version, what other means are available to determine the correct one?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
考虑到撤消文件中可能存在的任何历史记录现在都已经消失了(我能想到的唯一可以给出指示的东西),我认为将其缩小到特定修订的唯一方法将是暴力方法。
如果可能的修订范围有点大,并且建筑尺寸或其他非日期方面的变化是线性的或足够接近线性的,您可以使用
bisect
命令基本上进行二分搜索以缩小您要查找的修订版本(或者也许只是接近它)。在bisect
停止测试的每个修订版中,您将在该修订版上构建并测试您使用的任何方面,以与当晚计划的构建生成的内容进行比较。甚至可能不需要构建,具体取决于测试。如果它确实像您所描绘的图表一样简单并且可能性范围很短,那么您可以从它可能的最新版本开始,然后向后移动几个版本,针对原始版本进行测试。
对于比较两个构建的确定性测试,对测试构建进行散列并将其与原始构建的散列进行比较可能会起作用。如果夜间构建机器上的编译和同一版本的机器上的编译不产生二进制相同的构建,则您可能必须使用二进制比较(例如使用 xdelta 或 bsdiff)并寻找最小的差异。
Mercurial 没有您想要的信息:
Mercurial 并没有开箱即用地记录和跟踪对存储库执行的每个操作,例如
push
,拉动
,更新
。如果确实如此,它将产生大量日志信息。如果有人愿意的话,它确实提供了可以用来做到这一点的钩子。它也不关心您对工作目录的内容做了什么,例如打开文件或编译,所以它当然不会跟踪它。这根本不是 Mercurial 所做的。
不确切地知道预定的构建正在构建什么是一个错误。您默认同意,因为您现在记录了该信息。之前缺乏这些信息只会让你痛苦不堪,而且没有简单的方法可以摆脱它。 Mercurial 没有您需要的信息。如果中央存储库只是一个共享目录,而不是可能跟踪活动的 Web 托管存储库,则有关构建内容的唯一信息位于编译版本中。无论是源代码中声明的某些元数据成为构建的一部分,还是像文件大小这样的天真方面,或者您确实陷入了散列文件的困境,您不费吹灰之力就无法得到答案。
也许您不需要测试每个修订版;可能会有一些修订,您可以确定不是候选者。了解编译时间只是作为要测试的修订范围上限的一个因素。您知道,该时间之后的修订不可能成为候选。您不知道构建服务器从服务器中提取内容时推送到服务器的内容。但你确实知道,从那天起进行修改的可能性最大。您还知道,与线性修订和合并相比,并行未命名分支中的修订不太可能是候选者。如果有很多并行的未命名分支,并且您知道所有开发人员都以特定方式合并,您可能知道是否应该基于parent1或parent2下的修订进行测试。
如果您可以从源代码中解析元数据以与您对特定构建的了解进行比较,也许您甚至不需要编译。
您还可以自动化搜索。通过线性搜索来实现这一点是最简单的:设计时需要更少的启发式方法。
底线很简单,Mercurial 没有一个魔术按钮可以在这种情况下提供帮助。
Considering whatever history might have been in the undo files is gone by now (the only thing I can think of that could give an indication), I think the only way to narrow it down to a specific revision will be a brute force approach.
If the range of possible revisions is a bit large and the product of building changes in size or other non-date aspect that is linear or near enough to linear, you may be able to use the
bisect
command to basically do a binary search to narrow down what revision you're looking for (or maybe just get close to it). At each revision thatbisect
stops to test, you would build at that revision and test whatever aspect you're using to compare against what the scheduled build produced that night. Might not even require building, depending on the test.If it really is as simple as the graph you depict and the range of possibilities is short, you could just start from the latest revision it might be and walk backwards a few revisions, testing against the original build.
As for a definitive test comparing the two builds, hashing the test build and comparing it to a hash of the original build might work. If a compile on the nightly build machine and a compile on your machine of the same revision do not produce binary-identical builds, you may have to use binary diffing (such as with xdelta or bsdiff) and look for the smallest diff.
Mercurial does not have the information you want:
Mercurial does not, out of the box, make it its business to log and track every action performed regarding a repository, such as
push
,pull
,update
. If it did, it would be producing a lot of logging information. It does make available hooks that can be used to do that if one so desires.It also does not care what you do with the contents of the working directory, such as opening files or compiling, so of course it is not going to track that at all. It's simply not what Mercurial does.
It was a mistake to not know exactly what the scheduled build was building. You agree implicitly because you now log that very information. The lack of that information before has simply come back to bite you, and there is no easy way out of it. Mercurial does not have the information you need. If the central repo is just a shared directory rather than a web-hosted repository that might have tracked activity, the only information about what was built is in the compiled version. Whether it is some metadata declared in the source that becomes part of the build, a naive aspect like filesize, or you truly are stuck hashing files, you can't get your answer without some effort.
Maybe you don't need to test every revision; there may be revisions you can be certain are not candidates. Knowing the time of the compile is merely a factor as the upper bound on the range of revisions to test. You know that revisions after that time could not possibly be candidates. What you don't know is what was pushed to the server at the time the build server pulled from it. But you do know that revisions from that day are the most likely. You also know that revisions in parallel unnamed branches are less-likely candidates than linear revisions and merges. If there are a lot of parallel unnamed branches and you know all your developers merge in a particular way, you might know whether the revisions under parent1 or parent2 should be tested based.
Maybe you don't even need to compile if there is metadata you can parse from the source code to compare with what you know about the specific build.
And you can automate your search. It would be easiest to do so with a linear search: less heuristics to design.
The bottom line is simply that Mercurial does not have a magic button to help in this case.
抱歉,回答自己的问题可能是不好的形式,但评论框中没有足够的空间来正确回复。
对于乔尔,有几件事:
首先 - 我是真诚的 - 感谢您的回复。您提供了一个经过考虑的选项,但最终被拒绝,因为它太复杂而无法应用于我的构建环境。
其次,你有一点说教的成分。在该问题中,据了解,由于缺乏单独记录的存储库修订版本,因此需要“一些努力”来找出正确的修订版本。作为对 Lance 评论(上面)的回应,我同意记录 40 字节存储库哈希是归档必要构建信息的“正确”方法。然而,这个问题是关于如果您没有该信息可以做什么。
需要明确的是,我在 StackOverflow 上发布我的问题有两个原因:
解决方案
最后,也许我最感谢的是 Chris Morgan,他让我考虑使用中央服务器的 mercurial-server 日志。使用这些日志和一些脚本,我能够明确确定在构建时推送到中央存储库的修订集。所以,我感谢克里斯和其他所有做出回应的人。
Apologies, it's probably bad form to answer one's own question, but there wasn't enough room to properly respond in a comment box.
To Joel, a couple of things:
First - and I mean this sincerely - thanks for your response. You provided an option that was considered, but which was ultimately rejected because it would be too complex to apply to my build environment.
Second, you got a little preachy there. In the question, it was understood that because a separately recorded repository revision was absent, there would be 'some effort' to figure out the correct revision. In a response to Lance's comment (above), I agree that recording the 40-byte repository hash is the 'correct' way of archiving the necessary build info. However, this question was about what CAN be done IF you do not have that information.
To be clear, I posted my question on StackOverflow for two reasons:
Solution
In the end, perhaps my greatest thanks should go to Chris Morgan, who got me thinking to use the central server's mercurial-server logs. Using those logs, and some scripting, I was able to definitively determine the set of revisions that were pushed to the central repository at the time of the build. So, my thanks to Chris and to everyone else who responded.
正如乔尔所说,这是不可能的。然而,有一些解决方案可以帮助您:
REVISION_FORKED.BRANCH_NUMBER.BRANCH_REVISION
的形式,因此您的更改编号 2 将是 1.1.1As Joel said, it is not possible. However there are certain solutions that can help you:
REVISION_FORKED.BRANCH_NUMBER.BRANCH_REVISION
so your change number 2 would be 1.1.1