如何从文本文件中删除包含特定字符串的所有行?
如何使用 sed 删除文本文件中包含特定字符串的所有行?
How would I use sed to delete all lines in a text file that contain a specific string?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(21)
删除该行并将输出打印到标准输出:
直接修改文件 – 不适用于 BSD sed:
相同,但对于 BSD sed(Mac OS X 和 FreeBSD) – 不适用于 GNU sed:
直接修改文件(并创建备份)——与 BSD 和 GNU sed 一起使用:
To remove the line and print the output to standard out:
To directly modify the file – does not work with BSD sed:
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
To directly modify the file (and create a backup) – works with BSD and GNU sed:
除了
sed
之外,还有许多其他方法可以删除具有特定字符串的行:AWK
Ruby (1.9+)
Perl
Shell(bash 3.2 及更高版本)
GNU grep
当然还有
sed
(打印逆过程比实际删除更快):There are many other ways to delete lines with specific string besides
sed
:AWK
Ruby (1.9+)
Perl
Shell (bash 3.2 and later)
GNU grep
And of course
sed
(printing the inverse is faster than actual deletion):您可以使用 sed 替换文件中的适当行。但是,它似乎比使用 grep 反转到第二个文件然后将第二个文件移动到原始文件上要慢得多。
例如
或者
无论如何,第一个命令在我的机器上花费了 3 倍的时间。
You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
or
The first command takes 3 times longer on my machine anyway.
最简单的方法是使用 GNU sed:
The easy way to do it, with GNU
sed
:您可以考虑使用
ex
(这是基于标准 Unix 命令的编辑器):其中:
+
执行给定的 Ex 命令 (man ex
),与执行-c
相同>wq(写入并退出)g/match/d
- 用于删除具有给定匹配
的行的Ex命令,请参阅:Power of g上面的示例是符合 POSIX 标准的方法,用于按照此 在 Unix.SE 上发布 和 POSIX
ex
的规范。与
sed
的区别在于:除非您喜欢不可移植的代码、I/O 开销和其他一些不良副作用。所以基本上一些参数(例如 in-place/
-i
)是非标准的 FreeBSD 扩展,可能在其他操作系统上不可用。You may consider using
ex
(which is a standard Unix command-based editor):where:
+
executes given Ex command (man ex
), same as-c
which executeswq
(write and quit)g/match/d
- Ex command to delete lines with givenmatch
, see: Power of gThe above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for
ex
.The difference with
sed
is that:Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/
-i
) are non-standard FreeBSD extensions and may not be available on other operating systems.我在 Mac 上遇到了这个问题。另外,我需要使用变量替换来做到这一点。
所以我使用:
sed -i '' "/$pattern/d" $file
其中
$file
是需要删除的文件,$pattern
code> 是要匹配删除的模式。我从中选择了
''
评论。这里需要注意的是在
"/$pattern/d"
中使用了双引号。当我们使用单引号时变量将不起作用。I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where
$file
is the file where deletion is needed and$pattern
is the pattern to be matched for deletion.I picked the
''
from this comment.The thing to note here is use of double quotes in
"/$pattern/d"
. Variable won't work when we use single quotes.您还可以使用这个:
这里
-v
将仅打印除您的模式之外的其他内容(这意味着反向匹配)。You can also use this:
Here
-v
will print only other than your pattern (that means invert match).要使用
grep
获得类似结果,您可以这样做:To get a inplace like result with
grep
you can do this:从所有文件中删除匹配的行
Delete lines from all files that match the match
我用一个包含大约 345 000 行的文件做了一个小型基准测试。在这种情况下,使用
grep
的方式似乎比sed
方法快 15 倍左右。我已经尝试过设置 LC_ALL=C 和不设置 LC_ALL=C,它似乎并没有显着改变时间。搜索字符串 (CDGA_00004.pdbqt.gz.tar) 位于文件中间的某个位置。
以下是命令和时间:
I have made a small benchmark with a file which contains approximately 345 000 lines. The way with
grep
seems to be around 15 times faster than thesed
method in this case.I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
您还可以删除文件中的一系列行。
例如删除 SQL 文件中的存储过程。
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
这将删除 CREATE PROCEDURE 和 END ; 之间的所有行。
我已经用这个 sed 命令清理了许多 sql 文件。
You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.
第一个命令就地编辑文件 (-i)。
第二个命令执行相同的操作,但通过将 .bk 添加到文件名(.bk 可以更改为任何内容)来保留原始文件的副本或备份。
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).
我发现大多数答案对我来说没有用,如果你使用 vim,我发现这非常简单明了:
:g//d
来源
I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source
echo -e "/thing_to_delete\ndd\033:x\n" | vim 文件要编辑.txt
echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt
奇怪的是,接受的答案实际上并没有直接回答问题。问题询问如何使用 sed 替换字符串,但答案似乎预设了如何将任意字符串转换为正则表达式的知识。
许多编程语言库都有执行此类转换的函数,例如
但是如何在命令行上执行此操作?
由于这是一个面向 sed 的问题,一种方法是使用 sed 本身:
因此,给定一个任意字符串 $STRING,我们可以编写类似以下内容的内容:
或作为单行:
其变体如本页其他地方所述。
Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
So given an arbitrary string $STRING we could write something like:
or as a one-liner:
with variations as described elsewhere on this page.
万一有人想要精确匹配字符串,您可以使用 grep -w 中的
-w
标志来表示整个字符串。也就是说,例如,如果您想要删除编号为 11 的行,但保留编号为 111 的行:如果您想一次排除多个精确模式,它也可以使用
-f
标志。如果“黑名单”是一个每行都有多个模式的文件,您要从“文件”中删除:Just in case someone wants to do it for exact matches of strings, you can use the
-w
flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:It also works with the
-f
flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":在控制台中显示已处理的文本
将已处理的文本保存到文件中
将已处理的文本信息附加到现有文件
以处理已处理的文本,在这种情况下,删除已删除的更多行
| more
将一次显示一页的文本块。to show the treated text in console
to save treated text into a file
to append treated text info an existing file
to treat already treated text, in this case remove more lines of what has been removed
the
| more
will show text in chunks of one page at a time.您可以使用旧的
ed
来编辑文件,其方式与使用的 答案 类似例如
。在这种情况下,最大的区别是ed
通过标准输入获取命令,而不是像ex
那样作为命令行参数。在脚本中使用它时,适应这种情况的通常方法是使用 printf 将命令通过管道传递给它:或使用heredoc:
You can use good old
ed
to edit a file in a similar fashion to the answer that usesex
. The big difference in this case is thated
takes its commands via standard input, not as command line arguments likeex
can. When using it in a script, the usual way to accomodate this is to useprintf
to pipe commands to it:or with a heredoc:
该解决方案用于对多个文件执行相同的操作。
This solution is for doing the same operation on multiple file.