如何使用 sed/awk 或其他工具来协助搜索和替换 12GB subversion dump 文件
我遇到了一种特殊情况,我需要删除 Subversion 存储库中一系列提交的操作。 (/trunk /tags /branches) 的全部内容都被标记,并在发现错误后被删除。我只是使用 svndumpfilter 来删除有问题的节点,但后来有人重新使用了错误的标记名称,因此基于路径的排除将导致其他问题。我需要手动编辑 12GB 的转储文件。 我需要编辑一系列 15 个连续修订,它们以以下格式出现在转储中:
Revision-number: 60338
Prop-content-length: 143
Content-length: 143
K 7
svn:log
V 41
Tagging test prior to creating xx branch
K 10
svn:author
V 7
userx
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END
Node-path: test/tags/XX_8_0_FINAL
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 60337
Node-copyfrom-path: test
根据我所做的测试,我知道我需要将上面的部分更改为以下内容
Revision-number: 60338
Prop-content-length: 112
Content-length: 112
K 7
svn:log
V 38
This is an empty revision for padding.
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END
还有 14 个修订,其中需要进行同样的更换。 尝试在 VIM 中手动编辑文件是非常不切实际的。转储文件是二进制和 ASCII 文本的混合。 如果有人有任何 awk/sed 魔法可以帮助我,我将非常感激。
I've got a particular situation where I need to remove the operations of a series of commits in Subversion repository. Entire contents of (/trunk /tags /branches) were tagged and subsequently removed when the mistake was realized. I would simply use svndumpfilter to remove the offending nodes, but someone re-used the bad tag name at a later point so path-based exclusions will cause other problems. I need to manually edit the dump file which is 12GB.
I have a series of 15 sequential revisions I need to edit, which appear in the dump in the following format:
Revision-number: 60338
Prop-content-length: 143
Content-length: 143
K 7
svn:log
V 41
Tagging test prior to creating xx branch
K 10
svn:author
V 7
userx
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END
Node-path: test/tags/XX_8_0_FINAL
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 60337
Node-copyfrom-path: test
Based on testing I've done, I know I need the above section to change to the following
Revision-number: 60338
Prop-content-length: 112
Content-length: 112
K 7
svn:log
V 38
This is an empty revision for padding.
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END
There are 14 more revisions where the same replacement needs to take place.
Trying to edit the files manually in VIM is seriously impractical. The dump files are a mixture of binary and ascii text.
If anyone has any awk/sed magic that could help me, I'd be really appreciative.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
首先需要注意的是:sed 和 awk 设计用于处理纯文本文件。如果您的文件是二进制和 ascii 的混合,那么我不确定以下内容是否有效(我个人会使用 Perl)。
我假设“修订号:60338”是您想要用作触发器的内容(如果它出现在二进制文件中,天堂会帮助您)。将修订后的部分(“...这是一个空修订版...”)放入名为
newsection
的单独文件中。然后:First a big caveat: sed and awk are designed to work on pure text files. If your files are a mixture of binary and ascii then I'm not confident that the following will work (personally I'd use Perl).
I assume that the "Revision-number: 60338" is what you want to use as your trigger (and heaven help you if it occurs in the binary). Put your revised section ("...This is an empty revision...") in a separate file called, say,
newsection
. Then:SvnDumpTool 怎么样?您也许能够将最初的“好”部分与增量转储的编辑部分结合起来。
How about SvnDumpTool? You might be able to join the initial "good" part with the incrementally dumped edited parts.
我最终使用了以下步骤:
这为我提供了节点操作文件中“错误”提交的确切行号。
然后,我使用 sed 删除每次提交的节点操作范围(按行号),如下所示:
事实证明,这非常快。
对于那些好奇的人来说,我需要完全删除这些的原因是因为我们的存储库扫描器(Atlassian Fisheye)需要几天的时间来索引错误的提交。我使用的排除规则应该可以解决该问题,但事实证明我发现了排除规则的一个错误,该错误将在下一版本的 Fisheye 中修复。
看:
http://jira.atlassian.com/browse/FE-2752
I ended up using the following steps:
This gave me the exact line numbers in the file of the node-operations for the "bad" commits.
I then used sed to remove the range of node operations (by line number) for each commit as follows:
This proved to be pretty fast.
For those curious, the reason I need to remove these completely was because our repository scanner (Atlassian Fisheye) was taking days to index the bad commits. I was using exclusion rules that SHOULD have worked around the issue, but it turned out I uncovered a bug with exclusion rules that is due to be fixed in the next release of Fisheye.
See:
http://jira.atlassian.com/browse/FE-2752
这些提交是否包含机密材料,或者删除它们的原因是什么?为什么不让他们在存储库中删除标签/分支,就是这样。编辑:注意到您已经删除了标签/分支...
Do those commits contain confidential material or what's the reason to remove them? Why not let them in the repository remove the tags/branches and that's it. EDIT: Oversight that you already removed the tags/branches...