使用 Perl,如何在网络上解码或创建这些 % 编码?
I need to handle URI (i.e. percent) encoding and decoding in my Perl script. How do I do that?
This is a question from the official perlfaq. We're importing the perlfaq to Stack Overflow.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是官方常见问题解答减去后续编辑。
这些
%
编码处理 URI 中的保留字符,如 RFC 2396, 部分中所述2.。此编码将保留字符替换为 US-ASCII 表中字符编号的十六进制表示形式。例如,冒号:
变为%3A
。在 CGI 脚本中,如果您使用 CGI.pm,则不必担心解码 URI 。您不必自己处理 URI,无论是在传入还是传出时。
如果您必须自己对字符串进行编码,请记住您永远不应该尝试对已经组成的 URI 进行编码。您需要分别转义各个组件,然后将它们放在一起。要对字符串进行编码,您可以使用 URI::Escape 模块。
uri_escape
函数返回转义字符串:要解码字符串,请使用 uri_unescape 函数:
如果您想自己执行此操作,只需将保留字符替换为其编码即可。全局替换是实现此目的的一种方法:
This is the official FAQ answer minus subsequent edits.
Those
%
encodings handle reserved characters in URIs, as described in RFC 2396, Section 2. This encoding replaces the reserved character with the hexadecimal representation of the character's number from the US-ASCII table. For instance, a colon,:
, becomes%3A
.In CGI scripts, you don't have to worry about decoding URIs if you are using CGI.pm. You shouldn't have to process the URI yourself, either on the way in or the way out.
If you have to encode a string yourself, remember that you should never try to encode an already-composed URI. You need to escape the components separately then put them together. To encode a string, you can use the URI::Escape module. The
uri_escape
function returns the escaped string:To decode the string, use the uri_unescape function:
If you wanted to do it yourself, you simply need to replace the reserved characters with their encodings. A global substitution is one way to do it:
DIY 编码(改进上述版本):(
注意“%02x”而不仅仅是“%0x”)
DIY 解码(添加“+”->“”):
编码员帮助编码员 - 交换知识!
DIY encode (improving above version):
(note the '%02x' rather than only '%0x')
DIY decode (adding '+' -> ' '):
Coders helping coders - bartering knowledge!
也许这将有助于决定选择哪种方法。
Perl 5.32 的基准测试。对于给定的
$input
,每个函数都会返回相同的结果。代码:
结果:
不足为奇。专门的 C 解决方案速度最快。没有子调用的就地正则表达式速度相当快,紧随其后的是带有子调用的复制正则表达式。我没有研究为什么
uri_encode
比uri_escape
差那么多。Maybe this will help deciding which method to choose.
Benchmarks on perl 5.32. Every function returns same result for given
$input
.Code:
And results:
Not surprising. A specialized C solution is the fast. An in-place regex with no sub calls is quite fast, followed closely by a copying regex with a sub call. I didn't look into why
uri_encode
was so much worse thanuri_escape
.使用 URI
将使 URL 正常工作。use URI
and it will make URLs that just work.