如何在非常大的单行文件中查找模式及其周围内容？

发布于 2024-12-08 00:24:13 字数 456 浏览 0 评论 0原文

我有一个 100Mb+ 的非常大的文件，其中所有内容都在一行上。我希望在该文件中找到一个模式以及该模式周围的一些字符。

例如，我想调用如下命令，但其中 -A 和 -B 是字节数而不是行数：

cat very_large_file | grep -A 100 -B 100 somepattern

因此，对于包含如下内容的文件：

1234567890abcdefghijklmnopqrstuvwxyz

的模式

890abc
and a before size of -B 3 
and an after size of -A 3

使用我希望它返回

567890abcdef

：任何提示都是伟大的。非常感谢。

原文

I have a very large file 100Mb+ where all the content is on one line.
I wish to find a pattern in that file and a number of characters around that pattern.

For example I would like to call a command like the one below but where -A and -B are number of bytes not lines:

cat very_large_file | grep -A 100 -B 100 somepattern

So for a file containing content like this:

1234567890abcdefghijklmnopqrstuvwxyz

With a pattern of

890abc
and a before size of -B 3 
and an after size of -A 3

I want it to return:

567890abcdef

Any tips would be great.
Many thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

来世叙缘 2024-12-15 00:24:13

您可以尝试 -o 选项：

-o, --only-matching
      Show only the part of a matching line that matches PATTERN.

并使用正则表达式来匹配您的模式和 3 个前面/后面的字符，即

grep -o -P ".{3}pattern.{3}" very_large_file

在您给出的示例中，它将是

echo "1234567890abcdefghijklmnopqrstuvwxyz" > tmp.txt
grep -o -P ".{3}890abc.{3}" tmp.txt

You could try the -o option:

-o, --only-matching
      Show only the part of a matching line that matches PATTERN.

and use a regular expression to match your pattern and the 3 preceding/following characters i.e.

grep -o -P ".{3}pattern.{3}" very_large_file

In the example you gave, it would be

echo "1234567890abcdefghijklmnopqrstuvwxyz" > tmp.txt
grep -o -P ".{3}890abc.{3}" tmp.txt

回复收藏 0 原文

眼眸印温柔 2024-12-15 00:24:13

另一种是 sed（在 GNU grep 不可用的系统上，您可能需要它）：

sed -n '
  s/.*\(...890abc...\).*/\1/p
  ' infile

Another one with sed (you may need it on systems where GNU grep is not available):

sed -n '
  s/.*\(...890abc...\).*/\1/p
  ' infile

回复收藏 0 原文

硬不硬你别怂 2024-12-15 00:24:13

我能想到的最好方法是使用一个小的 Perl 脚本。

#!/usr/bin/perl
$pattern = $ARGV[0];
$before = $ARGV[1];
$after = $ARGV[2];

while(<>) {
  print amp; if( /.{$before}$pattern.{$after}/ );
}

然后你可以这样执行它：

cat very_large_file | ./myPerlScript.pl 890abc 3 3

编辑：天啊，保罗的解决方案要容易得多。哦，好吧，Perl 万岁！

Best way I can think of doing this is with a tiny Perl script.

#!/usr/bin/perl
$pattern = $ARGV[0];
$before = $ARGV[1];
$after = $ARGV[2];

while(<>) {
  print amp; if( /.{$before}$pattern.{$after}/ );
}

You would then execute it thusly:

cat very_large_file | ./myPerlScript.pl 890abc 3 3

EDIT: Dang, Paolo's solution is much easier. Oh well, viva la Perl!

回复收藏 0 原文

~没有更多了~

关于作者

请叫√我孤独

暂无简介

0 文章

0 评论

23 人气

关注发私信

苦中寻乐

文章 0 评论 0

关注

lueluelue

文章 0 评论 0

关注

嗼ふ静

文章 0 评论 0

关注

王权女流氓

文章 0 评论 0

关注

与花如笺

文章 0 评论 0

关注

残酷

文章 0 评论 0

友情链接

文江博客

如何在非常大的单行文件中查找模式及其周围内容？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

苦中寻乐

lueluelue

嗼ふ静

王权女流氓

与花如笺

残酷

友情链接

如何在非常大的单行文件中查找模式及其周围内容？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

苦中寻乐

lueluelue

嗼ふ静

王权女流氓

与花如笺

残酷

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。