如何在非常大的单行文件中查找模式及其周围内容?
我有一个 100Mb+ 的非常大的文件,其中所有内容都在一行上。 我希望在该文件中找到一个模式以及该模式周围的一些字符。
例如,我想调用如下命令,但其中 -A 和 -B 是字节数而不是行数:
cat very_large_file | grep -A 100 -B 100 somepattern
因此,对于包含如下内容的文件:
1234567890abcdefghijklmnopqrstuvwxyz
的模式
890abc
and a before size of -B 3
and an after size of -A 3
使用我希望它返回
567890abcdef
:任何提示都是伟大的。 非常感谢。
I have a very large file 100Mb+ where all the content is on one line.
I wish to find a pattern in that file and a number of characters around that pattern.
For example I would like to call a command like the one below but where -A and -B are number of bytes not lines:
cat very_large_file | grep -A 100 -B 100 somepattern
So for a file containing content like this:
1234567890abcdefghijklmnopqrstuvwxyz
With a pattern of
890abc
and a before size of -B 3
and an after size of -A 3
I want it to return:
567890abcdef
Any tips would be great.
Many thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以尝试 -o 选项:
并使用正则表达式来匹配您的模式和 3 个前面/后面的字符,即
在您给出的示例中,它将是
You could try the -o option:
and use a regular expression to match your pattern and the 3 preceding/following characters i.e.
In the example you gave, it would be
另一种是 sed(在 GNU grep 不可用的系统上,您可能需要它):
Another one with sed (you may need it on systems where GNU grep is not available):
我能想到的最好方法是使用一个小的 Perl 脚本。
然后你可以这样执行它:
编辑:天啊,保罗的解决方案要容易得多。哦,好吧,Perl 万岁!
Best way I can think of doing this is with a tiny Perl script.
You would then execute it thusly:
EDIT: Dang, Paolo's solution is much easier. Oh well, viva la Perl!