如何使用 sed/awk 或其他工具来协助搜索和替换 12GB subversion dump 文件

发布于 2024-09-15 10:18:43 字数 918 浏览 4 评论 0原文

我遇到了一种特殊情况,我需要删除 Subversion 存储库中一系列提交的操作。 (/trunk /tags /branches) 的全部内容都被标记,并在发现错误后被删除。我只是使用 svndumpfilter 来删除有问题的节点,但后来有人重新使用了错误的标记名称,因此基于路径的排除将导致其他问题。我需要手动编辑 12GB 的转储文件。 我需要编辑一系列 15 个连续修订,它们以以下格式出现在转储中:

Revision-number: 60338
Prop-content-length: 143
Content-length: 143

K 7
svn:log
V 41
Tagging test prior to creating xx branch
K 10
svn:author
V 7
userx
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END

Node-path: test/tags/XX_8_0_FINAL
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 60337
Node-copyfrom-path: test

根据我所做的测试,我知道我需要将上面的部分更改为以下内容

Revision-number: 60338
Prop-content-length: 112
Content-length: 112

K 7
svn:log
V 38
This is an empty revision for padding.
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END

还有 14 个修订,其中需要进行同样的更换。 尝试在 VIM 中手动编辑文件是非常不切实际的。转储文件是二进制和 ASCII 文本的混合。 如果有人有任何 awk/sed 魔法可以帮助我,我将非常感激。

I've got a particular situation where I need to remove the operations of a series of commits in Subversion repository. Entire contents of (/trunk /tags /branches) were tagged and subsequently removed when the mistake was realized. I would simply use svndumpfilter to remove the offending nodes, but someone re-used the bad tag name at a later point so path-based exclusions will cause other problems. I need to manually edit the dump file which is 12GB.
I have a series of 15 sequential revisions I need to edit, which appear in the dump in the following format:

Revision-number: 60338
Prop-content-length: 143
Content-length: 143

K 7
svn:log
V 41
Tagging test prior to creating xx branch
K 10
svn:author
V 7
userx
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END

Node-path: test/tags/XX_8_0_FINAL
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 60337
Node-copyfrom-path: test

Based on testing I've done, I know I need the above section to change to the following

Revision-number: 60338
Prop-content-length: 112
Content-length: 112

K 7
svn:log
V 38
This is an empty revision for padding.
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END

There are 14 more revisions where the same replacement needs to take place.
Trying to edit the files manually in VIM is seriously impractical. The dump files are a mixture of binary and ascii text.
If anyone has any awk/sed magic that could help me, I'd be really appreciative.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

稀香 2024-09-22 10:18:43

首先需要注意的是:sed 和 awk 设计用于处理纯文本文件。如果您的文件是二进制和 ascii 的混合,那么我不确定以下内容是否有效(我个人会使用 Perl)。

我假设“修订号:60338”是您想要用作触发器的内容(如果它出现在二进制文件中,天堂会帮助您)。将修订后的部分(“...这是一个空修订版...”)放入名为 newsection 的单独文件中。然后:

sed -e '/^Revision-number: 60338$/r newsection' -e '/^Revision-number: 60338$/,/^Node-copyfrom-path: test$/d' bigfilename

First a big caveat: sed and awk are designed to work on pure text files. If your files are a mixture of binary and ascii then I'm not confident that the following will work (personally I'd use Perl).

I assume that the "Revision-number: 60338" is what you want to use as your trigger (and heaven help you if it occurs in the binary). Put your revised section ("...This is an empty revision...") in a separate file called, say, newsection. Then:

sed -e '/^Revision-number: 60338$/r newsection' -e '/^Revision-number: 60338$/,/^Node-copyfrom-path: test$/d' bigfilename
负佳期 2024-09-22 10:18:43

SvnDumpTool 怎么样?您也许能够将最初的“好”部分与增量转储的编辑部分结合起来。

How about SvnDumpTool? You might be able to join the initial "good" part with the incrementally dumped edited parts.

锦爱 2024-09-22 10:18:43

我最终使用了以下步骤:

cat dump.file | grep -C 250 "Revision-number: xxxxx"

这为我提供了节点操作文件中“错误”提交的确切行号。
然后,我使用 sed 删除每次提交的节点操作范围(按行号),如下所示:

sed -e "123,456d" -e "234,456d"

事实证明,这非常快。
对于那些好奇的人来说,我需要完全删除这些的原因是因为我们的存储库扫描器(Atlassian Fisheye)需要几天的时间来索引错误的提交。我使用的排除规则应该可以解决该问题,但事实证明我发现了排除规则的一个错误,该错误将在下一版本的 Fisheye 中修复。
看:
http://jira.atlassian.com/browse/FE-2752

I ended up using the following steps:

cat dump.file | grep -C 250 "Revision-number: xxxxx"

This gave me the exact line numbers in the file of the node-operations for the "bad" commits.
I then used sed to remove the range of node operations (by line number) for each commit as follows:

sed -e "123,456d" -e "234,456d"

This proved to be pretty fast.
For those curious, the reason I need to remove these completely was because our repository scanner (Atlassian Fisheye) was taking days to index the bad commits. I was using exclusion rules that SHOULD have worked around the issue, but it turned out I uncovered a bug with exclusion rules that is due to be fixed in the next release of Fisheye.
See:
http://jira.atlassian.com/browse/FE-2752

巨坚强 2024-09-22 10:18:43

这些提交是否包含机密材料,或者删除它们的原因是什么?为什么不让他们在存储库中删除标签/分支,就是这样。编辑:注意到您已经删除了标签/分支...

Do those commits contain confidential material or what's the reason to remove them? Why not let them in the repository remove the tags/branches and that's it. EDIT: Oversight that you already removed the tags/branches...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文