不区分大小写的搜索并用 sed 替换

发布于 2024-10-06 22:39:35 字数 428 浏览 6 评论 0原文

我正在尝试使用 SED 从日志文件中提取文本。我可以毫不费力地进行搜索和替换:

sed 's/foo/bar/' mylog.txt

但是,我想让搜索不区分大小写。从我用谷歌搜索到的内容来看,似乎将 i 附加到命令末尾应该可以工作:

sed 's/foo/bar/i' mylog.txt

但是,这给了我一条错误消息:

sed: 1: "s/foo/bar/i": bad flag in substitute command: 'i'

这里出了什么问题,如何修复它?

I'm trying to use SED to extract text from a log file. I can do a search-and-replace without too much trouble:

sed 's/foo/bar/' mylog.txt

However, I want to make the search case-insensitive. From what I've googled, it looks like appending i to the end of the command should work:

sed 's/foo/bar/i' mylog.txt

However, this gives me an error message:

sed: 1: "s/foo/bar/i": bad flag in substitute command: 'i'

What's going wrong here, and how do I fix it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

夏有森光若流苏 2024-10-13 22:39:36

更新:从 ma​​cOS Big Sur (11.0) 开始,sed 现在确实支持标记了不区分大小写的匹配,因此问题中的命令现在应该可以工作(BSD sed 不报告其版本,但您可以按日期man 页面的底部,应该是 2017 年 3 月 27 日 或更晚的日期);一个简单的例子:

# BSD sed on macOS Big Sur and above (and GNU sed, the default on Linux)
$ sed 's/ö/@/I' <<<'FÖO'
F@O   # `I` matched the uppercase Ö correctly against its lowercase counterpart

注意:I(大写)是标志的记录形式,但 i 也可以工作。

类似地,以 < strong>ma​​cOS Big Sur (11.0) awk 现在支持区域设置awk --version 应报告 20200816 或更新版本):

# BSD awk on macOS Big Sur and above (and GNU awk, the default on Linux)
$ awk 'tolower($0)' <<<'FÖO'
föo  # non-ASCII character Ö was properly lowercased

以下内容适用于 ma​​cOS 直至 Catalina (10.15)

需要明确的是:在 macOS 上,< strong>sed - 这是 BSD 实现 - 不支持不区分大小写的匹配 - 很难相信,但却是事实。 以前接受的答案,它本身显示了一个GNU sed命令,由于评论中提到的基于 Perl 的解决方案而获得了这一地位。

要使该 Perl 解决方案也能通过 UTF-8 处理外来字符,请使用以下内容:

perl -C -Mutf8 -pe 's/öœ/oo/i' <<< "FÖŒ" # -> "Foo"
  • -C 打开 UTF-8 支持对于流和文件,假设当前区域设置是基于 UTF-8 的。
  • -Mutf8 告诉 Perl 将源代码解释为 UTF-8(在本例中,是传递给 -pe 的字符串) - 这是更详细的 -e 'use utf8;' 的更短等效项。谢谢,Mark Reed

(请注意,使用 awk 也不是一个选项,因为 macOS 上的 awk(即 BWK awkBSD awk)似乎完全不知道语言环境 - 它的 tolower()toupper() 函数忽略外来字符(以及 sub() / gsub() 没有不区分大小写的标志)。)


关于 sed< 关系的注释/code> 和 awk 符合 POSIX 标准:

BSD sedawk 将其功能大部分限制为< a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html" rel="nofollow noreferrer">POSIX sed
POSIX awk 规范强制执行,而它们的GNU 对应版本实现了更多的扩展。

Update: Starting with macOS Big Sur (11.0), sed now does support the I flag for case-insensitive matching, so the command in the question should now work (BSD sed doesn't report its version, but you can go by the date at the bottom of the man page, which should be March 27, 2017 or more recent); a simple example:

# BSD sed on macOS Big Sur and above (and GNU sed, the default on Linux)
$ sed 's/ö/@/I' <<<'FÖO'
F@O   # `I` matched the uppercase Ö correctly against its lowercase counterpart

Note: I (uppercase) is the documented form of the flag, but i works as well.

Similarly, starting with macOS Big Sur (11.0) awk now is locale-aware (awk --version should report 20200816 or more recent):

# BSD awk on macOS Big Sur and above (and GNU awk, the default on Linux)
$ awk 'tolower($0)' <<<'FÖO'
föo  # non-ASCII character Ö was properly lowercased

The following applies to macOS up to Catalina (10.15):

To be clear: On macOS, sed - which is the BSD implementation - does NOT support case-insensitive matching - hard to believe, but true. The formerly accepted answer, which itself shows a GNU sed command, gained that status because of the perl-based solution mentioned in the comments.

To make that Perl solution work with foreign characters as well, via UTF-8, use something like:

perl -C -Mutf8 -pe 's/öœ/oo/i' <<< "FÖŒ" # -> "Foo"
  • -C turns on UTF-8 support for streams and files, assuming the current locale is UTF-8-based.
  • -Mutf8 tells Perl to interpret the source code as UTF-8 (in this case, the string passed to -pe) - this is the shorter equivalent of the more verbose -e 'use utf8;'.Thanks, Mark Reed

(Note that using awk is not an option either, as awk on macOS (i.e., BWK awk and BSD awk) appears to be completely unaware of locales altogether - its tolower() and toupper() functions ignore foreign characters (and sub() / gsub() don't have case-insensitivity flags to begin with).)


A note on the relationship of sed and awk to the POSIX standard:

BSD sed and awk limit their functionality mostly to what the POSIX sed and
POSIX awk specs mandate, whereas their GNU counterparts implement many more extensions.

來不及說愛妳 2024-10-13 22:39:36

编者注:此解决方案不适用于 macOS(开箱即用),因为它仅适用于 GNU sed,而 macOS 附带 BSD sed

将“I”大写。

sed 's/foo/bar/I' file

Editor's note: This solution doesn't work on macOS (out of the box), because it only applies to GNU sed, whereas macOS comes with BSD sed.

Capitalize the 'I'.

sed 's/foo/bar/I' file
黯然 2024-10-13 22:39:36

Mac OS X 上 sed 的另一个解决方法是从 MacPorts 或 HomeBrew 安装 gsed,然后创建别名 sed='gsed'

Another work-around for sed on Mac OS X is to install gsedfrom MacPorts or HomeBrew and then create the alias sed='gsed'.

小清晰的声音 2024-10-13 22:39:36

例如,如果您首先进行模式匹配,

/pattern/s/xx/yy/g

那么您希望将 I 放在模式后面:

/pattern/Is/xx/yy/g

示例:

echo Fred | sed '/fred/Is//willma/g'

returns willma;如果没有 I,它会返回原样的字符串 (Fred)。

If you are doing pattern matching first, e.g.,

/pattern/s/xx/yy/g

then you want to put the I after the pattern:

/pattern/Is/xx/yy/g

Example:

echo Fred | sed '/fred/Is//willma/g'

returns willma; without the I, it returns the string untouched (Fred).

苏佲洛 2024-10-13 22:39:36

sed FAQ 解决了密切相关的不区分大小写的搜索。它指出 a) sed 的许多版本都支持它的标志 b) 在 sed 中这样做很尴尬,你应该使用 awk 或 Perl。

但是要在 POSIX sed 中执行此操作,他们建议了三个选项(改编用于此处替换):

  1. 转换为大写并将原始行存储在保留空间中;不过,这不适用于替换,因为原始内容将在打印之前恢复,因此它仅适用于基于不区分大小写的匹配插入或添加行。

  2. 也许可能性仅限于 FOOFoofoo。这些都可以涵盖

    s/FOO/bar/;s/[Ff]oo/bar/
    
  3. 要搜索所有可能的匹配项,可以对每个字符使用括号表达式:

    s/[Ff][Oo][Oo]/bar/
    

The sed FAQ addresses the closely related case-insensitive search. It points out that a) many versions of sed support a flag for it and b) it's awkward to do in sed, you should rather use awk or Perl.

But to do it in POSIX sed, they suggest three options (adapted for substitution here):

  1. Convert to uppercase and store original line in hold space; this won't work for substitutions, though, as the original content will be restored before printing, so it's only good for insert or adding lines based on a case-insensitive match.

  2. Maybe the possibilities are limited to FOO, Foo and foo. These can be covered by

    s/FOO/bar/;s/[Ff]oo/bar/
    
  3. To search for all possible matches, one can use bracket expressions for each character:

    s/[Ff][Oo][Oo]/bar/
    
葬心 2024-10-13 22:39:36

使用以下内容替换所有出现的情况:

sed 's/foo/bar/gI' mylog.txt

Use following to replace all occurrences:

sed 's/foo/bar/gI' mylog.txt
樱桃奶球 2024-10-13 22:39:36

Mac 版本的 sed 似乎有点受限。解决此问题的一种方法是使用 Linux 容器(通过 Docker),该容器具有可用版本的 sed:

cat your_file.txt | docker run -i busybox /bin/sed -r 's/[0-9]{4}/****/Ig'

The Mac version of sed seems a bit limited. One way to work around this is to use a linux container (via Docker) which has a useable version of sed:

cat your_file.txt | docker run -i busybox /bin/sed -r 's/[0-9]{4}/****/Ig'
风和你 2024-10-13 22:39:36

这不是一个直接的答案,但在某些情况下,可以通过 tr AZ az 传输整个内容以小写整个流。

当然,您会丢失大写字母,但这种损失可以通过简化管道的其他部分来抵消。数字和日期/时间也不受影响,并且输出流也会得到更好的压缩。电子邮件地址不区分大小写,因此这并不重要。

一个缺点是区分大小写的标识符可能会变得很尴尬。 Sendmail 日志会较少使用这种方式。

Not a direct answer, but in some contexts its okay to pipe the whole thing through tr A-Z a-z to lowercase the entire stream.

Sure, you lose the uppercase letters, but that loss may be offset by simplifying other parts of the pipeline. Numbers and date/time are unaffected too, and the output stream is going to compress better as well. Email addresses are not case-sensitive, so that doesn't matter.

One downside is case-sensitive identifiers might become awkward. Sendmail logs would be less use this way.

时光是把杀猪刀 2024-10-13 22:39:36

我有类似的需求,并提出了这个:

这个命令可以简单地查找所有文件:

grep -i -l -r foo ./* 

这个命令可以排除 this_shell.sh (如果您将该命令放入名为 this_shell.sh 的脚本中) ,将输出发送到控制台以查看发生了什么,然后对找到的每个文件名使用 sed 将文本 foo 替换为 bar:

grep -i -l -r --exclude "this_shell.sh" foo ./* | tee  /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done 

我选择此方法,因为我不喜欢为未修改的文件更改所有时间戳。输入 grep 结果只允许查看带有目标文本的文件(因此也可能提高性能/速度)

请务必备份您的文件和文件。使用前测试。对于带有嵌入空格的文件,在某些环境中可能不起作用。 (?)

I had a similar need, and came up with this:

this command to simply find all the files:

grep -i -l -r foo ./* 

this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:

grep -i -l -r --exclude "this_shell.sh" foo ./* | tee  /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done 

I chose this method, as I didn't like having all the timestamps changed for files not modified. feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)

be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)

花想c 2024-10-13 22:39:36

以下应该没问题:

sed -i 's/foo/bar/gi' mylog.txt

Following should be fine:

sed -i 's/foo/bar/gi' mylog.txt
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文