如何解码 Web 服务器上请求 URI 中的保留转义字符?

发布于 2024-11-05 11:46:54 字数 1289 浏览 0 评论 0原文

很明显,Web 服务器必须解码任何转义的非保留字符(例如字母数字等)才能进行 URI 比较。例如,http://www.example.com/~user/index.htm应与http://www.example.com/%7Euser/index.htm<相同/代码>。

我的问题是,我们要如何处理转义的保留字符?

例如 %2F/。如果请求URI中有%2F,Web服务器的解析器是否应该将其替换为/?在上面的示例中,这意味着 http://www.example.com/~user%2Findex.htm 将与 http://www.example.com/ 相同~user/index.htm?虽然我在 Apache 服务器(2.2.17 Unix)上尝试过它,但看起来它给出了“404 Not Found”错误。

那么这是否意味着 %2F 和其他转义保留字符应单独保留(至少在 URI 比较之前)?

背景信息:

RFC 2616 (HTTP 1.1)中有两个地方提到了转义解码问题:

Request-URI 以第 3.2.1 节中指定的格式传输。如果 Request-URI 使用“% HEX HEX”编码 [42] 进行编码,则源服务器必须解码 Request-URI 以便正确解释请求。服务器应该使用适当的状态代码响应无效的 Request-URI。

除了“保留”和“不安全”集中的字符(参见 RFC 2396 [42])之外的字符等效于它们的“%” HEX HEX”编码。

(根据 http://trac.tools.ietf.org/wg /httpbis/trac/ticket/2 “不安全”是一个错误,应从规范中删除,因此我们在这里仅查看“保留”。)

仅供参考,的定义。 RFC 2396 中的此类字符:

保留=“;” | “/”| “?” | “:”| “@” | “&” | “=” | “+”| “$”| “,”

未保留 = 字母 |标记

标记=“-”| “_”| “。” | “!” | “~”| “*”| “'”| “(”|“)”

It is pretty clear that a web server has to decode any escaped unreserved character (such as alphanums, etc.) to do the URI comparison. For example, http://www.example.com/~user/index.htm shall be identical to http://www.example.com/%7Euser/index.htm.

My question is, what are we gonna do with the escaped reserved characters?

An example would be %2F, or /. If there is an %2F in the request URI, should the parser of web server replace it with a /? In the above example, it would mean that http://www.example.com/~user%2Findex.htm would be the same as http://www.example.com/~user/index.htm? Although I tried it on an Apache server (2.2.17 Unix) and it looks like it gives a "404 Not Found" error.

So does that mean %2F and other escaped reserved characters shall be left alone (at least before the URI comparison)?

Background information:

There are two places in RFC 2616 (HTTP 1.1) mentioning the escape decoding issue:

The Request-URI is transmitted in the format specified in section 3.2.1. If the Request-URI is encoded using the “% HEX HEX” encoding [42], the origin server MUST decode the Request-URI in order to properly interpret the request. Servers SHOULD respond to invalid Request-URIs with an appropriate status code.

and

Characters other than those in the “reserved” and “unsafe” sets (see RFC 2396 [42]) are equivalent to their “"%" HEX HEX” encoding.

(according to http://trac.tools.ietf.org/wg/httpbis/trac/ticket/2 "unsafe" is a mistake and shall be removed from the spec. So we are only looking at "reserved" here.)

FYI, the definition of such characters in RFC 2396:

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","

unreserved = alphanum | mark

mark = "-" | "_" | "." | "!" | "˜" | "*" | "’" | "(" | ")"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

桜花祭 2024-11-12 11:46:55

tl;dr:

解码百分比编码的非保留字符,
保留百分比编码的保留字符。


URI 标准为 STD 66,目前为 RFC 3986

第 6 节是关于标准化和比较的 ,其中 第 6.2.2.2 节 解释了如何处理百分比编码的八位字节:

这些 URI 应该通过解码与未保留字符相对应的任何百分比编码八位字节来规范化 [...]

正如 第 2 节(粗体强调我的):

  • 非保留字符

    <块引用>

    在用相应的百分比编码的 US-ASCII 八位字节替换非保留字符方面存在差异的 URI 是等效的

  • 保留字符

    <块引用>

    在用相应的百分比编码八位字节替换保留字符方面存在差异的 URI 等效。

tl;dr:

Decode percent-encoded unreserved characters,
keep percent-encoded reserved characters.


The URI standard is STD 66, which currently is RFC 3986.

Section 6 is about Normalization and Comparison, where section 6.2.2.2 explains what to do with percent-encoded octets:

These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character […]

As explicitly stated in section 2 (bold emphasis mine):

  • Unreserved characters:

    URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent

  • Reserved characters:

    URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文