如何获取文件第一行之后与正则表达式匹配的部分

发布于 2024-11-30 10:43:26 字数 198 浏览 2 评论 0原文

我有一个大约 1000 行的文件。我想要文件中与 grep 语句匹配的行之后的部分。

即:

cat file | grep 'TERMINATE'     # It is found on line 534

所以,我想要文件从第535行到第1000行进行进一步处理。

我怎样才能做到这一点?

I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.

That is:

cat file | grep 'TERMINATE'     # It is found on line 534

So, I want the file from line 535 to line 1000 for further processing.

How can I do that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

╭ゆ眷念 2024-12-07 10:43:26

以下内容将打印匹配 TERMINATE 的行,直到文件末尾:

sed -n -e '/TERMINATE/,$p'

解释: -n 禁用 sed 的默认行为 执行脚本后打印每一行,-e 指示 sed 的脚本,/TERMINATE/,$ 是一个地址(行)范围选择意味着匹配的第一行TERMINATE 正则表达式(如 grep)到文件末尾 ($),而 p 是打印当前行的打印命令。

这将从匹配 TERMINATE 的行后面的行开始打印,直到文件末尾:
(从 AFTER 匹配行到 EOF,不包括匹配行)

sed -e '1,/TERMINATE/d'

解释: 1,/TERMINATE/ 是地址(行)范围选择,表示匹配行的第一行输入到匹配TERMINATE正则表达式的第一行,d是删除命令,删除当前行并跳到下一行。由于 sed 默认行为是打印行,因此它将打印 TERMINATE 之后到输入末尾的行。

如果您想要 TERMINATE 之前的行:

sed -e '/TERMINATE/,$d'

如果您希望在一次传递的两个不同文件中 TERMINATE 之前和之后的行:

sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file

之前和之后的文件将包含该行与终止,所以处理每个你需要使用:

head -n -1 before
tail -n +2 after

如果你不想在 sed 脚本中硬编码文件名,你可以:

before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file

但是你必须转义 $ 意味着最后一行,所以shell 不会尝试扩展$w 变量(请注意,我们现在在脚本周围使用双引号而不是单引号)。

我忘记告诉脚本中文件名后面的新行很重要,以便 sed 知道文件名结束。

如何用变量替换硬编码的TERMINATE

您可以为匹配文本创建一个变量,然后按照与前面的示例相同的方式进行操作:

matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file

在前面的示例中使用变量来匹配文本:

## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"

在这些情况下用变量替换文本的要点是:

  1. 变量 (< code>$variablename) 括在 单引号 中 ['] 不会“扩展”,但 双引号 内的变量 [< code>"] 会的。所以,你必须改变所有单引号双引号(如果它们包含要替换为变量的文本)
  2. sed 范围还包含$。 并紧跟一个字母,例如:$p$d$w 它们也看起来像变量。被扩展,所以你必须用反斜杠转义那些 $ 字符[\] 例如:\$p\$d\$w

The following will print the line matching TERMINATE till the end of the file:

sed -n -e '/TERMINATE/,$p'

Explained: -n disables default behavior of sed of printing each line after executing its script on it, -e indicated a script to sed, /TERMINATE/,$ is an address (line) range selection meaning the first line matching the TERMINATE regular expression (like grep) to the end of the file ($), and p is the print command which prints the current line.

This will print from the line that follows the line matching TERMINATE till the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)

sed -e '1,/TERMINATE/d'

Explained: 1,/TERMINATE/ is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE regular expression, and d is the delete command which delete the current line and skip to the next line. As sed default behavior is to print the lines, it will print the lines after TERMINATE to the end of input.

If you want the lines before TERMINATE:

sed -e '/TERMINATE/,$d'

And if you want both lines before and after TERMINATE in two different files in a single pass:

sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file

The before and after files will contain the line with terminate, so to process each you need to use:

head -n -1 before
tail -n +2 after

IF you do not want to hard code the filenames in the sed script, you can:

before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file

But then you have to escape the $ meaning the last line so the shell will not try to expand the $w variable (note that we now use double quotes around the script instead of single quotes).

I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.

How would you replace the hardcoded TERMINATE by a variable?

You would make a variable for the matching text and then do it the same way as the previous example:

matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file

to use a variable for the matching text with the previous examples:

## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"

The important points about replacing text with variables in these cases are:

  1. Variables ($variablename) enclosed in single quotes ['] won't "expand" but variables inside double quotes ["] will. So, you have to change all the single quotes to double quotes if they contain text you want to replace with a variable.
  2. The sed ranges also contain a $ and are immediately followed by a letter like: $p, $d, $w. They will also look like variables to be expanded, so you have to escape those $ characters with a backslash [\] like: \$p, \$d, \$w.
删除→记忆 2024-12-07 10:43:26

作为一个简单的近似,您可以使用

grep -A100000 TERMINATE file

which grep 进行 TERMINATE 并输出该行之后最多 100,000 行。

来自 手册页

-A NUM, --after-context=NUM

在匹配行后打印 NUM 行尾随上下文。
在之间放置一条包含组分隔符 (--) 的行
连续的比赛组。使用 -o 或 --only-matching
选项,这没有任何效果并给出警告。

As a simple approximation you could use

grep -A100000 TERMINATE file

which greps for TERMINATE and outputs up to 100,000 lines following that line.

From the man page:

-A NUM, --after-context=NUM

Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.

暮年慕年 2024-12-07 10:43:26

这里使用的工具是AWK

cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1}  {if (found) print }'

它是如何工作的:

  1. 我们设置变量“found”为零,
  2. 如果在正则表达式中找到“TERMINATE”匹配,则评估为 false,我们将其设置为 1。
  3. 如果我们的 'found' 变量计算结果为 True,则打印:)

如果您在非常大的文件上使用其他解决方案,它们可能会消耗大量内存。

A tool to use here is AWK:

cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1}  {if (found) print }'

How does this work:

  1. We set the variable 'found' to zero, evaluating false
  2. if a match for 'TERMINATE' is found with the regular expression, we set it to one.
  3. If our 'found' variable evaluates to True, print :)

The other solutions might consume a lot of memory if you use them on very large files.

零時差 2024-12-07 10:43:26

如果我正确理解你的问题,你确实需要之后TERMINATE行,不包括TERMINATE行。 AWK 可以通过一种简单的方式做到这一点:

awk '{if(found) print} /TERMINATE/{found=1}' your_file

说明:

  1. 虽然不是最佳实践,但您可以依赖事实上,所有变量默认为 0 或空字符串(如果未定义)。因此,第一个表达式 (if(found) print) 将不会打印任何内容。
  2. 打印完成后,我们检查这是否是起始行(不应包含在内)。

这将打印 TERMINATE之后的所有行。


概括:

  • 您有一个包含开始行和结束行的文件,并且您希望这些行之间的行排除 开始结束线。
  • 开始结束行可以通过与该行匹配的正则表达式来定义。

示例:

$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$

说明:

  1. 如果找到结束行,则不应进行打印。请注意,此检查是在实际打印之前完成的,以从结果中排除结束行。
  2. 如果设置了 found,则打印当前行。
  3. 如果找到 start 行,则设置 found=1 以便打印以下行。请注意,此检查是在实际打印之后完成的,以从结果中排除起始行。

注意:

  • 该代码依赖于以下事实:所有 AWK 变量默认为 0 或空字符串(如果未定义)。这是有效的,但可能不是最佳实践,因此您可以将 BEGIN{found=0} 添加到 AWK 表达式的开头。
  • 如果找到多个开始-结束块,它们都会被打印。

If I understand your question correctly you do want the lines after TERMINATE, not including the TERMINATE-line. AWK can do this in a simple way:

awk '{if(found) print} /TERMINATE/{found=1}' your_file

Explanation:

  1. Although not best practice, you could rely on the fact that all variables defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
  2. After the printing is done, we check if this is the starter-line (that should not be included).

This will print all lines after the TERMINATE-line.


Generalization:

  • You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
  • start- and end-lines could be defined by a regular expression matching the line.

Example:

$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$

Explanation:

  1. If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
  2. Print the current line if found is set.
  3. If the start-line is found then set found=1 so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.

Notes:

  • The code rely on the fact that all AWK variables defaults to 0 or the empty string if not defined. This is valid, but it may not be best practice so you could add a BEGIN{found=0} to the start of the AWK expression.
  • If multiple start-end-blocks are found, they are all printed.
无力看清 2024-12-07 10:43:26
grep -A 10000000 'TERMINATE' file       

比 sed 快得多,尤其是处理大文件时。它最多可处理 10M 行(或您输入的任何内容),因此使其足够大以处理您遇到的任何内容没有任何坏处。

grep -A 10000000 'TERMINATE' file       

is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.

拥抱我好吗 2024-12-07 10:43:26

使用 sedawk 可以通过多种方式实现此目的:

sed -n '/TERMINATE/,$p' file

这会在文件中查找 TERMINATE 并从该行打印到末尾文件。

awk '/TERMINATE/,0' file

这与 sed 的行为完全相同。

如果您知道要开始打印的行号,则可以将其与NR(记录数,最终指示行号)一起指定:

awk 'NR>=535' file

示例

$ seq 10 > a        #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10

There are many ways to do it with sed or awk:

sed -n '/TERMINATE/,$p' file

This looks for TERMINATE in your file and prints from that line up to the end of the file.

awk '/TERMINATE/,0' file

This is exactly the same behaviour as sed.

In case you know the number of the line from which you want to start printing, you can specify it together with NR (number of record, which eventually indicates the number of the line):

awk 'NR>=535' file

Example

$ seq 10 > a        #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10
薄情伤 2024-12-07 10:43:26

使用 Bash 参数扩展,如下所示:

content=$(cat file)
echo "${content#*TERMINATE}"

Use Bash parameter expansion like the following:

content=$(cat file)
echo "${content#*TERMINATE}"
分开我的手 2024-12-07 10:43:26

如果出于任何原因,您想避免使用 sed,以下内容将打印匹配 < code>TERMINATE 直到文件末尾:

tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file

并且以下内容将从匹配 TERMINATE 的下一行打印到文件末尾:

tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file

需要两个进程才能执行 sed 可以执行的操作一个进程,如果文件发生变化grep 和 tail 的执行之间,结果可能不一致,所以我建议使用 sed。此外,如果文件不包含TERMINATE,则第一个命令将失败。

If for any reason, you want to avoid using sed, the following will print the line matching TERMINATE till the end of the file:

tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file

And the following will print from the following line matching TERMINATE till the end of the file:

tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file

It takes two processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file doesn’t not contain TERMINATE, the first command fails.

策马西风 2024-12-07 10:43:26

优秀的替代品 sed 由 jfg956 回答,并且不包含匹配行:

Alternatives to the excellent sed answer by jfg956, and which don't include the matching line:

明明#如月 2024-12-07 10:43:26

这可能是一种方法。如果您知道 grep 单词位于文件的哪一行以及文件中有多少行:

grep -A466 'TERMINATE' file

This could be one way of doing it. If you know in what line of the file you have your grep word and how many lines you have in your file:

grep -A466 'TERMINATE' file
沙与沫 2024-12-07 10:43:26

在我的 bash 命令中,我正在文本文件 log.txt 中查找一些标记行。
我的标记是#mark1678793202693,该标记在文本文件中包含两次。我希望始终在两个相同标记之间打印块。

$a 包含文本文件 log.txt 中的所有行数

$aro 是转换变量 $a 到数组

$s 是起始行 - 第一个在行

$e 中找到的模式是结束行 - log.txt 中的第二个模式

现在您可以使用 sed并打印从 lineNumber 到 lineNumber 的行

a=$(awk '/#mark1678793202693/{print NR}' log.txt) ; aro=($a),s=${aro[0]};e=${aro[1]} ; sed -n -e "${s}","${e}p" log.txt

In my bash command I am looking for some mark lines in text file log.txt .
My mark is #mark1678793202693 this mark is contains 2 times in text file. I want always print block between two same marks.

$a is contains all number of lines in text file log.txt

$aro is conversion variable $a to array

$s is start line - first found pattern in line

$e is end line - second pattern in log.txt

Now you can use sed and print line from lineNumber to lineNumber

a=$(awk '/#mark1678793202693/{print NR}' log.txt) ; aro=($a),s=${aro[0]};e=${aro[1]} ; sed -n -e "${s}","${e}p" log.txt
狂之美人 2024-12-07 10:43:26

sed 是一个更好的工具:

sed -n '/re/,$p' file

其中 re 是一个正则表达式。

另一个选项是 grep--after-context 标志。您需要传递一个数字来结束,在文件上使用 wc 应该给出停止的正确值。将其与 -n 和您的匹配表达式结合起来。

sed is a much better tool for the job:

sed -n '/re/,$p' file

where re is a regular expression.

Another option is grep's --after-context flag. You need to pass in a number to end at, using wc on the file should give the right value to stop at. Combine this with -n and your match expression.

葮薆情 2024-12-07 10:43:26

这将打印从最后找到的行“TERMINATE”到文件末尾的所有行:

LINE_NUMBER=`grep -o -n TERMINATE $OSCAM_LOG | tail -n 1 | sed "s/:/ \\'/g" | awk -F" " '{print $1}'`
tail -n +$LINE_NUMBER $YOUR_FILE_NAME

This will print all lines from the last found line "TERMINATE" till the end of the file:

LINE_NUMBER=`grep -o -n TERMINATE $OSCAM_LOG | tail -n 1 | sed "s/:/ \\'/g" | awk -F" " '{print $1}'`
tail -n +$LINE_NUMBER $YOUR_FILE_NAME
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文