如何有效地替换大文件中最后一次出现的模式
给定一个包含以下内容的文件:
<root>
<a></a>
<b></b>
</root>
该命令应输出:
<root>
<a></a>
<b></b>
我尝试使用 sed
的 GNU Win32
端口进行的操作:
删除最后两行。< /strong>
这很快,但它假设 是倒数第二行,如果不是,则会导致错误。
sed -e '$d' test.xml | sed -e '$d'
用空字符串替换所有出现的 。
这可行,但比第一个解决方案慢,并且如果存在嵌套 < 则会中断;root>
元素(不太可能)。
sed -e 's|</root>||' test.xml
我正在处理的文件可能很大,因此效率很重要。
有没有办法将 sed 替换限制为文件中最后一次出现的位置?或者还有其他更快的实用程序吗?
Given a file with the following contents:
<root>
<a></a>
<b></b>
</root>
The command should output:
<root>
<a></a>
<b></b>
Things I've tried using the GNU Win32
port of sed
:
Remove the last two lines.
This is fast, but it assumes </root>
is the second to last line and will cause a bug if it's not.
sed -e '$d' test.xml | sed -e '$d'
Substituting all occurrences of </root>
with an empty string.
This works, but is slower than the first solution, and will break if there are nested <root>
elements (unlikely).
sed -e 's|</root>||' test.xml
The file I'm dealing with can be large so efficiency is important.
Is there a way to limit sed substitution to the last occurrence in the file? Or is there some other utility that would be faster?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
将 Perl 与 File::Backwards 结合使用应该非常快(我知道是相对的,但仍然......)。 Perlfaq5 有 一个关于向后浏览文件并删除行的主题。您可以使用本主题的代码作为起点来检查您的模式。
Using Perl with File::Backwards should be very fast (relative, I know, but still...). Perlfaq5 has a topic on going through a file backwards and removing lines. You can check for your pattern using this topic's code as a starting point.
使用 sed:
With
sed
:使用
awk
来实现这个怎么样?AWK:
First
/pattern/{action}
语句查找包含 only的行;
。它的模式找到它,动作忽略它。第二个
/pattern/{action}
语句查找包含anywhere< 的行/strong> 在行中。如果模式找到它,
sub function
会将其替换为空,并打印该行的其余部分。第三个操作,即
1
对于所有不具有模式的行都是正确的他们。如果找到它,就会打印它。
我做了一个快速测试,结果是这样的 -
测试:
SED:
这应该也有效。尽管它会删除所有
而不仅仅是最后一次出现。
How about using
awk
for this.AWK:
First
/pattern/{action}
statement looks for lines with only</root>
. It pattern finds it, action ignores it.Second
/pattern/{action}
statement looks for lines containing</root>
anywhere in the line. If pattern finds it,sub function
replaces it with nothing and prints rest of the line.Third action which is
1
is true for all the lines that does not have pattern</root>
in them. If it finds it, it prints it.I did a quick test and this was the result -
Test:
SED:
This should also work. Though it will remove all
</root>
and not just the last occurrence.这可能对您有用:
这假设每个
标记与结束标记匹配,并且这些标记出现在单独的行上(如根据示例)。
说明:
标记和开始
标记或文件结尾之间的行。标签,则将其保存在保留空间(HS)中,然后将其删除并开始新的循环。
标签,交换到 HS 并打印出其内容。标记和文件最后一行之间,则交换到 HS,删除第一行,即结束
标签并打印余数。
具有两遍的替代解决方案:
说明:
标记的行号
This might work for you:
This assumes that each
<root>
tag is matched with a closing</root>
tag and that these tags occur on separate lines (as per the example).Explanation:
</root>
tag and an opening<root>
tag or end-of-file.</root>
tag, save it in the hold space (HS) and then delete it and start a new cycle.<root>
tag, swap to the HS and print out its contents.</root>
tag and last line of the file, swap to the HS, delete the first line i.e. the closing</root>
tag and print the remainder.An alternative solution with two passes:
Explanation:
</root>
tags使用时间函数来看看哪个是有效的。 sed 应该是高效的。
在我看来,没有什么比 grep 更快了。尝试使用 awk index() 看看它是否更快。
Use time function to see which one is efficient. sed should be efficient.
In my opinion, there is nothing which is faster than grep. try it with awk index() to see if it is any faster.