如何使用 sed/awk 或其他工具来协助搜索和替换 12GB subversion dump 文件

发布于 2024-09-15 10:18:43 字数 918 浏览 4 评论 0原文

我遇到了一种特殊情况，我需要删除 Subversion 存储库中一系列提交的操作。 (/trunk /tags /branches) 的全部内容都被标记，并在发现错误后被删除。我只是使用 svndumpfilter 来删除有问题的节点，但后来有人重新使用了错误的标记名称，因此基于路径的排除将导致其他问题。我需要手动编辑 12GB 的转储文件。我需要编辑一系列 15 个连续修订，它们以以下格式出现在转储中：

Revision-number: 60338
Prop-content-length: 143
Content-length: 143

K 7
svn:log
V 41
Tagging test prior to creating xx branch
K 10
svn:author
V 7
userx
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END

Node-path: test/tags/XX_8_0_FINAL
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 60337
Node-copyfrom-path: test

根据我所做的测试，我知道我需要将上面的部分更改为以下内容

Revision-number: 60338
Prop-content-length: 112
Content-length: 112

K 7
svn:log
V 38
This is an empty revision for padding.
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END

还有 14 个修订，其中需要进行同样的更换。尝试在 VIM 中手动编辑文件是非常不切实际的。转储文件是二进制和 ASCII 文本的混合。如果有人有任何 awk/sed 魔法可以帮助我，我将非常感激。

原文

I've got a particular situation where I need to remove the operations of a series of commits in Subversion repository. Entire contents of (/trunk /tags /branches) were tagged and subsequently removed when the mistake was realized. I would simply use svndumpfilter to remove the offending nodes, but someone re-used the bad tag name at a later point so path-based exclusions will cause other problems. I need to manually edit the dump file which is 12GB.
I have a series of 15 sequential revisions I need to edit, which appear in the dump in the following format:

Revision-number: 60338
Prop-content-length: 143
Content-length: 143

K 7
svn:log
V 41
Tagging test prior to creating xx branch
K 10
svn:author
V 7
userx
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END

Node-path: test/tags/XX_8_0_FINAL
Node-kind: dir
Node-action: add
Node-copyfrom-rev: 60337
Node-copyfrom-path: test

Based on testing I've done, I know I need the above section to change to the following

Revision-number: 60338
Prop-content-length: 112
Content-length: 112

K 7
svn:log
V 38
This is an empty revision for padding.
K 8
svn:date
V 27
2009-05-27T15:01:31.812916Z
PROPS-END

There are 14 more revisions where the same replacement needs to take place.
Trying to edit the files manually in VIM is seriously impractical. The dump files are a mixture of binary and ascii text.
If anyone has any awk/sed magic that could help me, I'd be really appreciative.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

稀香 2024-09-22 10:18:43

首先需要注意的是：sed 和 awk 设计用于处理纯文本文件。如果您的文件是二进制和 ascii 的混合，那么我不确定以下内容是否有效（我个人会使用 Perl）。

我假设“修订号：60338”是您想要用作触发器的内容（如果它出现在二进制文件中，天堂会帮助您）。将修订后的部分（“...这是一个空修订版...”）放入名为 newsection 的单独文件中。然后：

sed -e '/^Revision-number: 60338$/r newsection' -e '/^Revision-number: 60338$/,/^Node-copyfrom-path: test$/d' bigfilename

First a big caveat: sed and awk are designed to work on pure text files. If your files are a mixture of binary and ascii then I'm not confident that the following will work (personally I'd use Perl).

I assume that the "Revision-number: 60338" is what you want to use as your trigger (and heaven help you if it occurs in the binary). Put your revised section ("...This is an empty revision...") in a separate file called, say, newsection. Then:

sed -e '/^Revision-number: 60338$/r newsection' -e '/^Revision-number: 60338$/,/^Node-copyfrom-path: test$/d' bigfilename

回复收藏 0 原文

负佳期 2024-09-22 10:18:43

SvnDumpTool 怎么样？您也许能够将最初的“好”部分与增量转储的编辑部分结合起来。

回复收藏 0 原文

锦爱 2024-09-22 10:18:43

我最终使用了以下步骤：

cat dump.file | grep -C 250 "Revision-number: xxxxx"

这为我提供了节点操作文件中“错误”提交的确切行号。
然后，我使用 sed 删除每次提交的节点操作范围（按行号），如下所示：

sed -e "123,456d" -e "234,456d"

事实证明，这非常快。
对于那些好奇的人来说，我需要完全删除这些的原因是因为我们的存储库扫描器（Atlassian Fisheye）需要几天的时间来索引错误的提交。我使用的排除规则应该可以解决该问题，但事实证明我发现了排除规则的一个错误，该错误将在下一版本的 Fisheye 中修复。
看：
http://jira.atlassian.com/browse/FE-2752

I ended up using the following steps:

cat dump.file | grep -C 250 "Revision-number: xxxxx"

This gave me the exact line numbers in the file of the node-operations for the "bad" commits.
I then used sed to remove the range of node operations (by line number) for each commit as follows:

sed -e "123,456d" -e "234,456d"

This proved to be pretty fast.
For those curious, the reason I need to remove these completely was because our repository scanner (Atlassian Fisheye) was taking days to index the bad commits. I was using exclusion rules that SHOULD have worked around the issue, but it turned out I uncovered a bug with exclusion rules that is due to be fixed in the next release of Fisheye.
See:
http://jira.atlassian.com/browse/FE-2752

回复收藏 0 原文