如何获取文件第一行之后与正则表达式匹配的部分
我有一个大约 1000 行的文件。我想要文件中与 grep 语句匹配的行之后的部分。
即:
cat file | grep 'TERMINATE' # It is found on line 534
所以,我想要文件从第535行到第1000行进行进一步处理。
我怎样才能做到这一点?
I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.
That is:
cat file | grep 'TERMINATE' # It is found on line 534
So, I want the file from line 535 to line 1000 for further processing.
How can I do that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
以下内容将打印匹配
TERMINATE
的行,直到文件末尾:解释:
-n
禁用sed
的默认行为 执行脚本后打印每一行,-e
指示sed
的脚本,/TERMINATE/,$
是一个地址(行)范围选择意味着匹配的第一行TERMINATE
正则表达式(如 grep)到文件末尾 ($
),而p
是打印当前行的打印命令。这将从匹配
TERMINATE
的行后面的行开始打印,直到文件末尾:(从 AFTER 匹配行到 EOF,不包括匹配行)
解释:
1,/TERMINATE/
是地址(行)范围选择,表示匹配行的第一行输入到匹配TERMINATE
正则表达式的第一行,d
是删除命令,删除当前行并跳到下一行。由于sed
默认行为是打印行,因此它将打印TERMINATE
之后到输入末尾的行。如果您想要
TERMINATE
之前的行:如果您希望在一次传递的两个不同文件中
TERMINATE
之前和之后的行:之前和之后的文件将包含该行与终止,所以处理每个你需要使用:
如果你不想在 sed 脚本中硬编码文件名,你可以:
但是你必须转义
$
意味着最后一行,所以shell 不会尝试扩展$w
变量(请注意,我们现在在脚本周围使用双引号而不是单引号)。我忘记告诉脚本中文件名后面的新行很重要,以便 sed 知道文件名结束。
如何用变量替换硬编码的
TERMINATE
?您可以为匹配文本创建一个变量,然后按照与前面的示例相同的方式进行操作:
在前面的示例中使用变量来匹配文本:
在这些情况下用变量替换文本的要点是:
单引号
中 ['
] 不会“扩展”,但双引号
内的变量 [< code>"] 会的。所以,你必须改变所有单引号
到双引号
(如果它们包含要替换为变量的文本)sed
范围还包含$。
并紧跟一个字母,例如:$p
、$d
、$w
它们也看起来像变量。被扩展,所以你必须用反斜杠转义那些$
字符[\
] 例如:\$p
、\$d
、\$w
。The following will print the line matching
TERMINATE
till the end of the file:Explained:
-n
disables default behavior ofsed
of printing each line after executing its script on it,-e
indicated a script tosed
,/TERMINATE/,$
is an address (line) range selection meaning the first line matching theTERMINATE
regular expression (like grep) to the end of the file ($
), andp
is the print command which prints the current line.This will print from the line that follows the line matching
TERMINATE
till the end of the file:(from AFTER the matching line to EOF, NOT including the matching line)
Explained:
1,/TERMINATE/
is an address (line) range selection meaning the first line for the input to the 1st line matching theTERMINATE
regular expression, andd
is the delete command which delete the current line and skip to the next line. Assed
default behavior is to print the lines, it will print the lines afterTERMINATE
to the end of input.If you want the lines before
TERMINATE
:And if you want both lines before and after
TERMINATE
in two different files in a single pass:The before and after files will contain the line with terminate, so to process each you need to use:
IF you do not want to hard code the filenames in the sed script, you can:
But then you have to escape the
$
meaning the last line so the shell will not try to expand the$w
variable (note that we now use double quotes around the script instead of single quotes).I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.
How would you replace the hardcoded
TERMINATE
by a variable?You would make a variable for the matching text and then do it the same way as the previous example:
to use a variable for the matching text with the previous examples:
The important points about replacing text with variables in these cases are:
$variablename
) enclosed insingle quotes
['
] won't "expand" but variables insidedouble quotes
["
] will. So, you have to change all thesingle quotes
todouble quotes
if they contain text you want to replace with a variable.sed
ranges also contain a$
and are immediately followed by a letter like:$p
,$d
,$w
. They will also look like variables to be expanded, so you have to escape those$
characters with a backslash [\
] like:\$p
,\$d
,\$w
.作为一个简单的近似,您可以使用
which grep 进行
TERMINATE
并输出该行之后最多 100,000 行。来自 手册页:
As a simple approximation you could use
which greps for
TERMINATE
and outputs up to 100,000 lines following that line.From the man page:
这里使用的工具是AWK:
它是如何工作的:
如果您在非常大的文件上使用其他解决方案,它们可能会消耗大量内存。
A tool to use here is AWK:
How does this work:
The other solutions might consume a lot of memory if you use them on very large files.
如果我正确理解你的问题,你确实需要之后
TERMINATE
行,不包括TERMINATE
行。 AWK 可以通过一种简单的方式做到这一点:说明:
if(found) print
) 将不会打印任何内容。这将打印
TERMINATE
行之后的所有行。概括:
示例:
说明:
found
,则打印当前行。found=1
以便打印以下行。请注意,此检查是在实际打印之后完成的,以从结果中排除起始行。注意:
BEGIN{found=0}
添加到 AWK 表达式的开头。If I understand your question correctly you do want the lines after
TERMINATE
, not including theTERMINATE
-line. AWK can do this in a simple way:Explanation:
if(found) print
) will not print anything to start off with.This will print all lines after the
TERMINATE
-line.Generalization:
Example:
Explanation:
found
is set.found=1
so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.Notes:
BEGIN{found=0}
to the start of the AWK expression.比 sed 快得多,尤其是处理大文件时。它最多可处理 10M 行(或您输入的任何内容),因此使其足够大以处理您遇到的任何内容没有任何坏处。
is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.
使用
sed
或awk
可以通过多种方式实现此目的:这会在文件中查找
TERMINATE
并从该行打印到末尾文件。这与 sed 的行为完全相同。
如果您知道要开始打印的行号,则可以将其与
NR
(记录数,最终指示行号)一起指定:示例
There are many ways to do it with
sed
orawk
:This looks for
TERMINATE
in your file and prints from that line up to the end of the file.This is exactly the same behaviour as
sed
.In case you know the number of the line from which you want to start printing, you can specify it together with
NR
(number of record, which eventually indicates the number of the line):Example
使用 Bash 参数扩展,如下所示:
Use Bash parameter expansion like the following:
如果出于任何原因,您想避免使用 sed,以下内容将打印匹配 < code>TERMINATE 直到文件末尾:
并且以下内容将从匹配
TERMINATE
的下一行打印到文件末尾:需要两个进程才能执行 sed 可以执行的操作一个进程,如果文件发生变化grep 和 tail 的执行之间,结果可能不一致,所以我建议使用 sed。此外,如果文件不包含
TERMINATE
,则第一个命令将失败。If for any reason, you want to avoid using sed, the following will print the line matching
TERMINATE
till the end of the file:And the following will print from the following line matching
TERMINATE
till the end of the file:It takes two processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file doesn’t not contain
TERMINATE
, the first command fails.优秀的替代品
sed
由 jfg956 回答,并且不包含匹配行:awk '/TERMINATE/ {y=1;next} y'
(Hai Vu 对'grep +A'的回答:在匹配后打印所有内容)awk '/终止/? c++ : c'
(Steven Penny 对 < em>'grep +A': 打印匹配后的所有内容)perl -ne 'print except 1 .. /TERMINATE/'
(tchrist 的回答Alternatives to the excellent
sed
answer by jfg956, and which don't include the matching line:awk '/TERMINATE/ {y=1;next} y'
(Hai Vu's answer to 'grep +A': print everything after a match)awk '/TERMINATE/ ? c++ : c'
(Steven Penny's answer to 'grep +A': print everything after a match)perl -ne 'print unless 1 .. /TERMINATE/'
(tchrist's answer to 'grep +A': print everything after a match)这可能是一种方法。如果您知道 grep 单词位于文件的哪一行以及文件中有多少行:
This could be one way of doing it. If you know in what line of the file you have your grep word and how many lines you have in your file:
在我的 bash 命令中,我正在文本文件 log.txt 中查找一些标记行。
我的标记是#mark1678793202693,该标记在文本文件中包含两次。我希望始终在两个相同标记之间打印块。
$a 包含文本文件 log.txt 中的所有行数
$aro 是转换变量 $a 到数组
$s 是起始行 - 第一个在行
$e 中找到的模式是结束行 - log.txt 中的第二个模式
现在您可以使用 sed并打印从 lineNumber 到 lineNumber 的行
In my bash command I am looking for some mark lines in text file log.txt .
My mark is #mark1678793202693 this mark is contains 2 times in text file. I want always print block between two same marks.
$a is contains all number of lines in text file log.txt
$aro is conversion variable $a to array
$s is start line - first found pattern in line
$e is end line - second pattern in log.txt
Now you can use sed and print line from lineNumber to lineNumber
sed 是一个更好的工具:
其中
re
是一个正则表达式。另一个选项是 grep 的
--after-context
标志。您需要传递一个数字来结束,在文件上使用 wc 应该给出停止的正确值。将其与-n
和您的匹配表达式结合起来。sed is a much better tool for the job:
where
re
is a regular expression.Another option is grep's
--after-context
flag. You need to pass in a number to end at, using wc on the file should give the right value to stop at. Combine this with-n
and your match expression.这将打印从最后找到的行“TERMINATE”到文件末尾的所有行:
This will print all lines from the last found line "TERMINATE" till the end of the file: