当前位置：文江博客话题详情

使用 awk printf 对文本进行 urldecode

发布于 2024-09-19 22:30:36 字数 349 浏览 16 评论 0原文

我正在使用 awk 对一些文本进行 urldecode。

如果我将字符串编码到 printf 语句中，例如 printf "%s", "\x3D" ，它会正确输出 =。如果我将整个转义字符串作为变量，则效果相同。

但是，如果我只有 3D，如何附加 \x 以便 printf 将打印 =而不是 \x3D？

我正在使用 busybox awk 1.4.2 和 ash shell。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

森林散布 2024-09-26 22:30:36

我不知道你如何在 awk 中做到这一点，但在 perl 中这很简单：

echo "http://example.com/?q=foo%3Dbar" | 
    perl -pe 's/\+/ /g; s/%([0-9a-f]{2})/chr(hex($1))/eig'

I don't know how you do this in awk, but it's trivial in perl:

echo "http://example.com/?q=foo%3Dbar" | 
    perl -pe 's/\+/ /g; s/%([0-9a-f]{2})/chr(hex($1))/eig'

回复收藏 0 原文

变身佩奇 2024-09-26 22:30:36

GNU awk

#!/usr/bin/awk -fn
@include "ord"
BEGIN {
  RS = "%.."
}
{
  printf RT ? $0 chr("0x" substr(RT, 2)) : $0
}

或

#!/bin/sh
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..

解码 URL 编码（百分比编码）

GNU awk

#!/usr/bin/awk -fn
@include "ord"
BEGIN {
  RS = "%.."
}
{
  printf RT ? $0 chr("0x" substr(RT, 2)) : $0
}

#!/bin/sh
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..

Decoding URL encoding (percent encoding)

回复收藏 0 原文

谷夏 2024-09-26 22:30:36

由于您使用的是 ash 并且 Perl 不可用，我假设您可能没有 gawk。

对我来说，使用 gawk 或 busybox awk，你的第二个示例与第一个示例的工作方式相同（我从两个示例中得到“=”），除非我使用 -- posix 选项（在这种情况下，我得到“x3D”）。

如果我将 --non-decimal-data 或 --traditional 与 gawk 一起使用，我会得到“=”。

您使用什么版本的 AWK（awk、nawk、gawk、busybox - 和版本号）？

编辑：

您可以通过添加零将变量的字符串值强制转换为数字 1：

~/busybox/awk 'BEGIN { string="3D"; pre="0x"; hex=pre string; printf "%c", hex+0}'

Since you're using ash and Perl isn't available, I'm assuming that you may not have gawk.

For me, using gawk or busybox awk, your second example works the same as the first (I get "=" from both) unless I use the --posix option (in which case I get "x3D" for both).

If I use --non-decimal-data or --traditional with gawk I get "=".

What version of AWK are you using (awk, nawk, gawk, busybox - and version number)?

Edit:

You can coerce the variable's string value into a numeric one by adding zero:

~/busybox/awk 'BEGIN { string="3D"; pre="0x"; hex=pre string; printf "%c", hex+0}'

回复收藏 0 原文

茶色山野 2024-09-26 22:30:36

这依赖于 gnu awk 对 split 函数的扩展，但这有效：

gawk '{ numElems = split($0, arr, /%../, seps);
        outStr = ""
        for (i = 1; i <= numElems - 1; i++) {
            outStr = outStr arr[i]
            outStr = outStr sprintf("%c", strtonum("0x" substr(seps[i],2)))
        }
        outStr = outStr arr[i]
        print outStr
      }'

This relies on gnu awk's extension of the split function, but this works:

gawk '{ numElems = split($0, arr, /%../, seps);
        outStr = ""
        for (i = 1; i <= numElems - 1; i++) {
            outStr = outStr arr[i]
            outStr = outStr sprintf("%c", strtonum("0x" substr(seps[i],2)))
        }
        outStr = outStr arr[i]
        print outStr
      }'

回复收藏 0 原文

愿得七秒忆 2024-09-26 22:30:36

首先，我知道这是一个老问题，但没有一个答案对我有用（仅限于 busybox awk）

两种选择。解析标准输入：

awk '{for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y));gsub(/%25/, "%");print}'

要获取命令行参数：

awk 'BEGIN {for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y), ARGV[1]);gsub(/%25/, "%", ARGV[1]);print ARGV[1]}' parameter

必须最后执行 %25，否则像 %253D 这样的字符串会被双重解析，这是不应该发生的。

y==38 的内联检查是因为 gsub 将 &作为一个特殊字符，除非你反斜杠它。

To start with, I'm aware this is an old question, but none of the answers worked for me (restricted to busybox awk)

Two options. To parse stdin:

awk '{for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y));gsub(/%25/, "%");print}'

To take a command line parameter:

awk 'BEGIN {for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y), ARGV[1]);gsub(/%25/, "%", ARGV[1]);print ARGV[1]}' parameter

Have to do %25 last because otherwise strings like %253D get double-parsed, which shouldn't happen.

The inline check for y==38 is because gsub treats & as a special character unless you backslash it.

回复收藏 0 原文

拒绝两难 2024-09-26 22:30:36

这是其中速度最快的一个，而且它不需要 gawk：

#!/usr/bin/mawk -f

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART + 1, RLENGTH - 1)
        rep = sprintf("%c", ("0x" mid) + 0)
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

{
    print decode_url($0)
}

将其另存为 decode_url.awk 并像平常一样使用它。例如：

$ ./decode_url.awk <<< 'Hello%2C%20world%20%21'
Hello, world !

但如果你想要一个更快的版本：

#!/usr/bin/mawk -f

function gen_url_decode_array(      i, n, c) {
    delete decodeArray
    for (i = 32; i < 64; ++i) {
        c = sprintf("%c", i)
        n = sprintf("%%%02X", i)
        decodeArray[n] = c
        decodeArray[tolower(n)] = c
    }
}

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART, RLENGTH)
        rep = decodeArray[mid]
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

BEGIN {
    gen_url_decode_array()
}

{
    print decode_url($0)
}

除 mawk 之外的其他解释器应该没有问题。

This one is the fastest of them all by a large margin and it doesn't need gawk:

#!/usr/bin/mawk -f

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART + 1, RLENGTH - 1)
        rep = sprintf("%c", ("0x" mid) + 0)
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

{
    print decode_url($0)
}

Save it as decode_url.awk and use it like you normally would. E.g:

$ ./decode_url.awk <<< 'Hello%2C%20world%20%21'
Hello, world !

But if you want an even faster version:

#!/usr/bin/mawk -f

function gen_url_decode_array(      i, n, c) {
    delete decodeArray
    for (i = 32; i < 64; ++i) {
        c = sprintf("%c", i)
        n = sprintf("%%%02X", i)
        decodeArray[n] = c
        decodeArray[tolower(n)] = c
    }
}

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART, RLENGTH)
        rep = decodeArray[mid]
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

BEGIN {
    gen_url_decode_array()
}

{
    print decode_url($0)
}

Other interpreters than mawk should have no problem with them.

回复收藏 0 原文

~没有更多了~