sed 表达式中的命令替换

发布于 2024-12-08 20:58:11 字数 974 浏览 7 评论 0原文

我对 bash/sed 没有什么问题。我需要能够在 sed 表达式中使用命令替换。我有两个大文本文件:

  • 第一个是 logfile.txt,有时* 按 ID 显示错误消息(0xdeadbeef 是常见示例),格式为 ERRORID:0xdeadbeef

  • 第二个errors.txt 的错误消息存储在对 LONG_ERROR_DESCRIPTION, 0xdeadbeef

我试图使用 sed 和 bash 命令替换来完成任务:

cat logfile.txt | sed "s/ERRORID:\(0x[0-9a-f]*\)/ERROR:$(cat errors.txt |
    grep \1 | grep -o '^[A-Z_]*' )/g"

(^^^ 当然,这应该在一行中)

如果它会起作用,然后我可以获得更好的日志文件版本和更好的错误信息。

   Lot's of meaningless stuff ERRORID:0xdeadbeef and something else =>
=> Lot's of meaningless stuff ERROR:LONG_ERROR_DESCRIPTION and something else 

但事实并非如此。问题是 sed 无法将正则表达式部分 (\1)“注入”到命令替换中。我还有哪些其他选择?我知道可以先构建 sed 表达式或以其他方式执行,但我想避免多次解析这些文件(它们可能很大)。

一如既往,非常感谢您的帮助。

*日志文件内没有真正的格式。 没有使用不一致的节、列、制表符/逗号分隔

PS, 。只是为了解释一下。以下表达式有效,但当然其中没有传递任何参数:

echo "my cute cat" | sed "s/cat/$(echo dog)/g"

I'm having little problem with bash/sed. I need to be able to use command substitution within sed expression. I have two big text files:

  • first is logfile.txt which sometimes* shows error messages by ID (0xdeadbeef is common example) in format ERRORID:0xdeadbeef

  • second errors.txt has error messages stored in pairs LONG_ERROR_DESCRIPTION, 0xdeadbeef

I was trying to use sed with bash command substitution to do the task:

cat logfile.txt | sed "s/ERRORID:\(0x[0-9a-f]*\)/ERROR:$(cat errors.txt |
    grep \1 | grep -o '^[A-Z_]*' )/g"

(^^^ this should be in one line of course)

If it would work then I could get little nicer version of logfile with better error info.

   Lot's of meaningless stuff ERRORID:0xdeadbeef and something else =>
=> Lot's of meaningless stuff ERROR:LONG_ERROR_DESCRIPTION and something else 

But it doesn't. The problem is that sed is unable to "inject" regex section (\1) into command substitution. What are my other options? I know that it's possible to build sed expression first or do it other way but I would like to avoid parsing those files several times (they can be huge).

As always big thanks for any help.

*there is no real formatting inside logfile. No sections, columns, tab/coma-separation are used inconsistently

PS. Just to explain. Following expression works, but of course there is no argument passing within it:

echo "my cute cat" | sed "s/cat/$(echo dog)/g"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

青瓷清茶倾城歌 2024-12-15 20:58:11

您可以从错误消息目录创建 sed 脚本,然后将该 sed 脚本应用到日志文件。

基本上,大致如下:

sed 's/\(.*\), 0x\([0-9A-F]*\)$/s%ERRORID:0x\2%ERROR:\1%g/' errors.txt |
sed -f - logfile.txt

第一个 sed 脚本的输出应该是这样的:

s%ERRORID:0x00000001%ERROR:Out of memory%
s%ERRORID:0x00000002%ERROR:Stack overflow%
s%ERRORID:0x00000031%ERROR:values of beta may cause dom%

也就是说,一个新的 sed 脚本指定目录中每个错误代码的替换。

sed 有不同的方言,因此可能需要进行一些细微的调整。我认为 Linux 上的 sed 应该在正则表达式中对括号进行分组之前使用反斜杠,并乐意容忍标准输入作为 -f 选项的参数。不过,这不能移植到其他 Unices(但如果您需要可移植性,可以用 Perl 代替 sed)。

*编辑:如果错误消息相当静态,和/或您想从标准输入读取日志,请将生成的脚本保存在文件中;

# Do this once
sed 's/\(.*\), 0x\([0-9A-F]*\)$/s%ERRORID:0x\2%ERROR:\1%g/' errors.txt >errors.sed
# Use it many times
sed -f errors.sed logfile.txt

您还可以在 errors.sed 顶部添加 #!/usr/bin/sed -f 并添加 chmod +x 来实现到一个独立的命令脚本中。

You can create a sed script from the error message catalog, then apply that sed script to the log file.

Basically, something along these lines:

sed 's/\(.*\), 0x\([0-9A-F]*\)$/s%ERRORID:0x\2%ERROR:\1%g/' errors.txt |
sed -f - logfile.txt

The output from the first sed script should be something like this:

s%ERRORID:0x00000001%ERROR:Out of memory%
s%ERRORID:0x00000002%ERROR:Stack overflow%
s%ERRORID:0x00000031%ERROR:values of beta may cause dom%

That is, a new sed script which specifies a substitution for each error code in the catalog.

There are different dialects of sed so this may require minor tweaking. The sed on Linux I believe should use backslash before grouping parentheses in regular expressions, and gladly tolerate standard input as the argument to the -f option. This is not portable to other Unices, though (but you could substitute Perl for sed if you need portability).

*Edit: If the error messages are fairly static, and/or you want to read the log from standard input, save the generated script in a file;

# Do this once
sed 's/\(.*\), 0x\([0-9A-F]*\)$/s%ERRORID:0x\2%ERROR:\1%g/' errors.txt >errors.sed
# Use it many times
sed -f errors.sed logfile.txt

You could also add #!/usr/bin/sed -f at the top of errors.sed and chmod +x it to make it into a self-contained command script.

当梦初醒 2024-12-15 20:58:11

我不知道这是否有效,因为我无法得到捕获组是否持续存在的答案,但是 sed 不仅仅是 s 命令。我想您可以在正则表达式行选择器中使用捕获组,然后将其用于命令替换。像这样的事情:

/ERRORID:\(0x[0-9a-f]*\)/  s/ERRORID:0x[0-9a-f]*/ERROR:$(grep \1 errors.txt | grep -o '^[A-Z_]*' )/

无论如何,如果这不起作用,我会改变立场并指出这对 Perl 来说确实是一份好工作。这是我的做法,我认为这更清晰/更容易理解:

#!/usr/bin/perl

while(<>) {
  while( /ERRORID:(0x[0-9a-f]*)/ ) {
    $name = system("grep $1 errors.txt | grep -o '^[A-Z_]*'");
    s/ERRORID:$1/ERROR:$name/g;
  }
  print;
}

然后执行:

./thatScript.pl logfile.txt

I don't know if this would work, since I can't get an answer on whether or not capture groups persist, but there is a lot more to sed than just the s command. I was thinking you could use a capture group in a regex line selector, then use that for the command substitution. Something like this:

/ERRORID:\(0x[0-9a-f]*\)/  s/ERRORID:0x[0-9a-f]*/ERROR:$(grep \1 errors.txt | grep -o '^[A-Z_]*' )/

Anyway, if that doesn't work I would change gears and point out that this is really a good job for Perl. Here's how I would do it, which I think is much cleaner / easier to understand:

#!/usr/bin/perl

while(<>) {
  while( /ERRORID:(0x[0-9a-f]*)/ ) {
    $name = system("grep $1 errors.txt | grep -o '^[A-Z_]*'");
    s/ERRORID:$1/ERROR:$name/g;
  }
  print;
}

Then execute:

./thatScript.pl logfile.txt
断爱 2024-12-15 20:58:11

只是为了让人们寻找裸壳和 sed 的解决方案。不完美但有效:

cat logfile.txt | while read line ; do id=$(echo -E "$line" | 
    grep "ERRORID:0x[0-9a-f]*" | grep -o "0x[0-9a-f]*" ) ; 
    if [ ! -z "$id" ] ; then echo -E "$line" | sed "s/$id/$(grep $id errors.txt | 
    grep -o '^[A-Z_]*' )/g" ;else echo -E "$line" ; fi ; done

如果您看到一些修复选项,请分享。

Just to let people looking for solution with bare shell and sed. Not perfect but working:

cat logfile.txt | while read line ; do id=$(echo -E "$line" | 
    grep "ERRORID:0x[0-9a-f]*" | grep -o "0x[0-9a-f]*" ) ; 
    if [ ! -z "$id" ] ; then echo -E "$line" | sed "s/$id/$(grep $id errors.txt | 
    grep -o '^[A-Z_]*' )/g" ;else echo -E "$line" ; fi ; done

If you see some fixing options then please share.

紫轩蝶泪 2024-12-15 20:58:11

使用 GNU awk for gensub() 和 3rg arg to match():

$ awk '
    NR==FNR {
        map[$NF] = gensub(/,[^,]+$/,"",1)
        next
    }
    match($0,/(.*ERRORID:)(0x[[:xdigit:]]+)(.*)/,a) {
        $0 = a[1] (a[2] in map ? map[a[2]] : a[2]) a[3]
    }
1' errors.txt logfile.txt
Lot's of meaningless stuff ERRORID:LONG_ERROR_DESCRIPTION and something else =>

上面的运行速度比当前接受的答案中的 sed 脚本快得多,并且考虑到 LONG_ERROR_DESCRIPTION 的各种可能内容,不会失败作为 %&\1,当给定的 ERRORID 是另一个 ERRORID 的子集时,也不会失败,例如,如果0xdead0xdeadbeef 是 2 个独立的错误代码,那么 sed 脚本可能会失败,具体取决于它们在 error.txt 中出现的顺序,例如它们可以转换 ERRORS:0xdeadbeef< /code> 到错误:LONG_ERROR_DESCRIPTIONbeef。首先映射0xdead

With GNU awk for gensub() and the 3rg arg to match():

$ awk '
    NR==FNR {
        map[$NF] = gensub(/,[^,]+$/,"",1)
        next
    }
    match($0,/(.*ERRORID:)(0x[[:xdigit:]]+)(.*)/,a) {
        $0 = a[1] (a[2] in map ? map[a[2]] : a[2]) a[3]
    }
1' errors.txt logfile.txt
Lot's of meaningless stuff ERRORID:LONG_ERROR_DESCRIPTION and something else =>

The above will run much faster than the sed scripts in the currently accepted answer and won't fail given various possible contents of LONG_ERROR_DESCRIPTION such as % or & or \1, nor will it fail when a given ERRORID is a subset of another, e.g. if 0xdead and 0xdeadbeef are 2 separate error codes then the sed scripts can fail depending on the order they appear in errors.txt, e.g. they could convert ERRORS:0xdeadbeef to ERRORS:LONG_ERROR_DESCRIPTIONbeef. by mapping 0xdead first.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文