不区分大小写的搜索并用 sed 替换
我正在尝试使用 SED 从日志文件中提取文本。我可以毫不费力地进行搜索和替换:
sed 's/foo/bar/' mylog.txt
但是,我想让搜索不区分大小写。从我用谷歌搜索到的内容来看,似乎将 i
附加到命令末尾应该可以工作:
sed 's/foo/bar/i' mylog.txt
但是,这给了我一条错误消息:
sed: 1: "s/foo/bar/i": bad flag in substitute command: 'i'
这里出了什么问题,如何修复它?
I'm trying to use SED to extract text from a log file. I can do a search-and-replace without too much trouble:
sed 's/foo/bar/' mylog.txt
However, I want to make the search case-insensitive. From what I've googled, it looks like appending i
to the end of the command should work:
sed 's/foo/bar/i' mylog.txt
However, this gives me an error message:
sed: 1: "s/foo/bar/i": bad flag in substitute command: 'i'
What's going wrong here, and how do I fix it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
更新:从 macOS Big Sur (11.0) 开始,
sed
现在确实支持我
标记了不区分大小写的匹配,因此问题中的命令现在应该可以工作(BSDsed
不报告其版本,但您可以按日期man
页面的底部,应该是2017 年 3 月 27 日
或更晚的日期);一个简单的例子:注意:
I
(大写)是标志的记录形式,但i
也可以工作。类似地,以 < strong>macOS Big Sur (11.0)
awk
现在支持区域设置(awk --version
应报告20200816
或更新版本):以下内容适用于 macOS 直至 Catalina (10.15):
需要明确的是:在 macOS 上,< strong>
sed
- 这是 BSD 实现 - 不支持不区分大小写的匹配 - 很难相信,但却是事实。 以前接受的答案,它本身显示了一个GNUsed
命令,由于评论中提到的基于 Perl 的解决方案而获得了这一地位。要使该 Perl 解决方案也能通过 UTF-8 处理外来字符,请使用以下内容:
-C
打开 UTF-8 支持对于流和文件,假设当前区域设置是基于 UTF-8 的。-Mutf8
告诉 Perl 将源代码解释为 UTF-8(在本例中,是传递给-pe
的字符串) - 这是更详细的-e 'use utf8;' 的更短等效项。
谢谢,Mark Reed(请注意,使用
awk
也不是一个选项,因为 macOS 上的awk
(即 BWK awk 和 BSD awk)似乎完全不知道语言环境 - 它的tolower()
和toupper()
函数忽略外来字符(以及sub()
/gsub()
没有不区分大小写的标志)。)关于
sed< 关系的注释/code> 和
awk
符合 POSIX 标准:BSD
sed
和awk
将其功能大部分限制为< a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html" rel="nofollow noreferrer">POSIXsed
和POSIX
awk
规范强制执行,而它们的GNU 对应版本实现了更多的扩展。Update: Starting with macOS Big Sur (11.0),
sed
now does support theI
flag for case-insensitive matching, so the command in the question should now work (BSDsed
doesn't report its version, but you can go by the date at the bottom of theman
page, which should beMarch 27, 2017
or more recent); a simple example:Note:
I
(uppercase) is the documented form of the flag, buti
works as well.Similarly, starting with macOS Big Sur (11.0)
awk
now is locale-aware (awk --version
should report20200816
or more recent):The following applies to macOS up to Catalina (10.15):
To be clear: On macOS,
sed
- which is the BSD implementation - does NOT support case-insensitive matching - hard to believe, but true. The formerly accepted answer, which itself shows a GNUsed
command, gained that status because of theperl
-based solution mentioned in the comments.To make that Perl solution work with foreign characters as well, via UTF-8, use something like:
-C
turns on UTF-8 support for streams and files, assuming the current locale is UTF-8-based.-Mutf8
tells Perl to interpret the source code as UTF-8 (in this case, the string passed to-pe
) - this is the shorter equivalent of the more verbose-e 'use utf8;'.
Thanks, Mark Reed(Note that using
awk
is not an option either, asawk
on macOS (i.e., BWK awk and BSD awk) appears to be completely unaware of locales altogether - itstolower()
andtoupper()
functions ignore foreign characters (andsub()
/gsub()
don't have case-insensitivity flags to begin with).)A note on the relationship of
sed
andawk
to the POSIX standard:BSD
sed
andawk
limit their functionality mostly to what the POSIXsed
andPOSIX
awk
specs mandate, whereas their GNU counterparts implement many more extensions.编者注:此解决方案不适用于 macOS(开箱即用),因为它仅适用于 GNU
sed
,而 macOS 附带 BSDsed
。将“I”大写。
Editor's note: This solution doesn't work on macOS (out of the box), because it only applies to GNU
sed
, whereas macOS comes with BSDsed
.Capitalize the 'I'.
Mac OS X 上
sed
的另一个解决方法是从 MacPorts 或 HomeBrew 安装gsed
,然后创建别名sed='gsed'
。Another work-around for
sed
on Mac OS X is to installgsed
from MacPorts or HomeBrew and then create the aliassed='gsed'
.例如,如果您首先进行模式匹配,
那么您希望将
I
放在模式后面:示例:
returns
willma
;如果没有I
,它会返回原样的字符串 (Fred
)。If you are doing pattern matching first, e.g.,
then you want to put the
I
after the pattern:Example:
returns
willma
; without theI
, it returns the string untouched (Fred
).sed FAQ 解决了密切相关的不区分大小写的搜索。它指出 a) sed 的许多版本都支持它的标志 b) 在 sed 中这样做很尴尬,你应该使用 awk 或 Perl。
但是要在 POSIX sed 中执行此操作,他们建议了三个选项(改编用于此处替换):
转换为大写并将原始行存储在保留空间中;不过,这不适用于替换,因为原始内容将在打印之前恢复,因此它仅适用于基于不区分大小写的匹配插入或添加行。
也许可能性仅限于
FOO
、Foo
和foo
。这些都可以涵盖要搜索所有可能的匹配项,可以对每个字符使用括号表达式:
The sed FAQ addresses the closely related case-insensitive search. It points out that a) many versions of sed support a flag for it and b) it's awkward to do in sed, you should rather use awk or Perl.
But to do it in POSIX sed, they suggest three options (adapted for substitution here):
Convert to uppercase and store original line in hold space; this won't work for substitutions, though, as the original content will be restored before printing, so it's only good for insert or adding lines based on a case-insensitive match.
Maybe the possibilities are limited to
FOO
,Foo
andfoo
. These can be covered byTo search for all possible matches, one can use bracket expressions for each character:
使用以下内容替换所有出现的情况:
Use following to replace all occurrences:
Mac 版本的
sed
似乎有点受限。解决此问题的一种方法是使用 Linux 容器(通过 Docker),该容器具有可用版本的 sed:The Mac version of
sed
seems a bit limited. One way to work around this is to use a linux container (via Docker) which has a useable version ofsed
:这不是一个直接的答案,但在某些情况下,可以通过 tr AZ az 传输整个内容以小写整个流。
当然,您会丢失大写字母,但这种损失可以通过简化管道的其他部分来抵消。数字和日期/时间也不受影响,并且输出流也会得到更好的压缩。电子邮件地址不区分大小写,因此这并不重要。
一个缺点是区分大小写的标识符可能会变得很尴尬。 Sendmail 日志会较少使用这种方式。
Not a direct answer, but in some contexts its okay to pipe the whole thing through
tr A-Z a-z
to lowercase the entire stream.Sure, you lose the uppercase letters, but that loss may be offset by simplifying other parts of the pipeline. Numbers and date/time are unaffected too, and the output stream is going to compress better as well. Email addresses are not case-sensitive, so that doesn't matter.
One downside is case-sensitive identifiers might become awkward. Sendmail logs would be less use this way.
我有类似的需求,并提出了这个:
这个命令可以简单地查找所有文件:
这个命令可以排除 this_shell.sh (如果您将该命令放入名为 this_shell.sh 的脚本中) ,将输出发送到控制台以查看发生了什么,然后对找到的每个文件名使用 sed 将文本 foo 替换为 bar:
我选择此方法,因为我不喜欢为未修改的文件更改所有时间戳。输入 grep 结果只允许查看带有目标文本的文件(因此也可能提高性能/速度)
请务必备份您的文件和文件。使用前测试。对于带有嵌入空格的文件,在某些环境中可能不起作用。 (?)
I had a similar need, and came up with this:
this command to simply find all the files:
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
I chose this method, as I didn't like having all the timestamps changed for files not modified. feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
以下应该没问题:
Following should be fine: