当前位置：文江博客话题详情

使用 Perl，如何在网络上解码或创建这些 % 编码？

发布于 2024-10-08 22:44:06 字数 267 浏览 2 评论 0原文

我需要在 Perl 脚本中处理 URI（即百分比）编码和解码。我该怎么做？

这是来自官方 perlfaq 的问题。我们正在将 perlfaq 导入 Stack Overflow。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

別甾虛僞 2024-10-15 22:44:06

这是官方常见问题解答减去后续编辑。

这些 % 编码处理 URI 中的保留字符，如 RFC 2396, 部分中所述2.。此编码将保留字符替换为 US-ASCII 表中字符编号的十六进制表示形式。例如，冒号 : 变为 %3A。

在 CGI 脚本中，如果您使用 CGI.pm，则不必担心解码 URI 。您不必自己处理 URI，无论是在传入还是传出时。

如果您必须自己对字符串进行编码，请记住您永远不应该尝试对已经组成的 URI 进行编码。您需要分别转义各个组件，然后将它们放在一起。要对字符串进行编码，您可以使用 URI::Escape 模块。 uri_escape 函数返回转义字符串：

my $original = "Colon : Hash # Percent %";

my $escaped = uri_escape( $original );

print "$escaped\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25'

要解码字符串，请使用 uri_unescape 函数：

my $unescaped = uri_unescape( $escaped );

print $unescaped; # back to original

如果您想自己执行此操作，只需将保留字符替换为其编码即可。全局替换是实现此目的的一种方法：

# encode
$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;

#decode
$string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;

This is the official FAQ answer minus subsequent edits.

Those % encodings handle reserved characters in URIs, as described in RFC 2396, Section 2. This encoding replaces the reserved character with the hexadecimal representation of the character's number from the US-ASCII table. For instance, a colon, :, becomes %3A.

In CGI scripts, you don't have to worry about decoding URIs if you are using CGI.pm. You shouldn't have to process the URI yourself, either on the way in or the way out.

If you have to encode a string yourself, remember that you should never try to encode an already-composed URI. You need to escape the components separately then put them together. To encode a string, you can use the URI::Escape module. The uri_escape function returns the escaped string:

my $original = "Colon : Hash # Percent %";

my $escaped = uri_escape( $original );

print "$escaped\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25'

To decode the string, use the uri_unescape function:

my $unescaped = uri_unescape( $escaped );

print $unescaped; # back to original

If you wanted to do it yourself, you simply need to replace the reserved characters with their encodings. A global substitution is one way to do it:

# encode
$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;

#decode
$string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;

回复收藏 0 原文

满栀 2024-10-15 22:44:06

DIY 编码（改进上述版本）：（

$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%02x", ord $1 /eg;

注意“%02x”而不仅仅是“%0x”）

DIY 解码（添加“+”->“”）：

$string =~ s/\+/ /g; $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;

编码员帮助编码员 - 交换知识！

DIY encode (improving above version):

$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%02x", ord $1 /eg;

(note the '%02x' rather than only '%0x')

DIY decode (adding '+' -> ' '):

$string =~ s/\+/ /g; $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;

Coders helping coders - bartering knowledge!

回复收藏 0 原文

错々过的事 2024-10-15 22:44:06

也许这将有助于决定选择哪种方法。

Perl 5.32 的基准测试。对于给定的 $input，每个函数都会返回相同的结果。

代码：

#!/usr/bin/env perl

my $input = "ala ma 0,5 litra 40%'owej vodki :)";

use Net::Curl::Easy;
my $easy = Net::Curl::Easy->new();
use URI::Encode qw( uri_encode );
use URI::Escape qw( uri_escape );
use Benchmark(cmpthese);

cmpthese(-3, {
    'a' => sub {
        my $string = $input;
        $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
    },
    'b' => sub {
        my $string = $input;
        $string = $easy->escape( $string );
    },
    'c' => sub {
        my $string = $input;
        $string = uri_encode( $string, {encode_reserved => 1} ); 
    },
    'd' => sub {
        my $string = $input;
        $string = uri_escape( $string );
    },
});

结果：

       Rate      c      d      a      b
c    5618/s     --   -98%   -99%  -100%
d  270517/s  4716%     --   -31%   -80%
a  393480/s  6905%    45%     --   -71%
b 1354747/s 24016%   401%   244%     --

不足为奇。专门的 C 解决方案速度最快。没有子调用的就地正则表达式速度相当快，紧随其后的是带有子调用的复制正则表达式。我没有研究为什么 uri_encode 比 uri_escape 差那么多。

Maybe this will help deciding which method to choose.

Benchmarks on perl 5.32. Every function returns same result for given $input.

Code:

#!/usr/bin/env perl

my $input = "ala ma 0,5 litra 40%'owej vodki :)";

use Net::Curl::Easy;
my $easy = Net::Curl::Easy->new();
use URI::Encode qw( uri_encode );
use URI::Escape qw( uri_escape );
use Benchmark(cmpthese);

cmpthese(-3, {
    'a' => sub {
        my $string = $input;
        $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
    },
    'b' => sub {
        my $string = $input;
        $string = $easy->escape( $string );
    },
    'c' => sub {
        my $string = $input;
        $string = uri_encode( $string, {encode_reserved => 1} ); 
    },
    'd' => sub {
        my $string = $input;
        $string = uri_escape( $string );
    },
});

And results:

       Rate      c      d      a      b
c    5618/s     --   -98%   -99%  -100%
d  270517/s  4716%     --   -31%   -80%
a  393480/s  6905%    45%     --   -71%
b 1354747/s 24016%   401%   244%     --

Not surprising. A specialized C solution is the fast. An in-place regex with no sub calls is quite fast, followed closely by a copying regex with a sub call. I didn't look into why uri_encode was so much worse than uri_escape.

回复收藏 0 原文