解析 HTTP 标头值:引用、RFC 5987、MIME 等
令我困惑的是 HTTP 标头值的解码。
标头示例:Some-Header: "带引号的字符串?"; *utf-8'en'Weirdness
标题值可以被引用吗? "
本身的编码怎么样?'
是一个有效的引号字符吗?分号 (;
) 的意义是什么? HTTP 标头的值解析器是否被视为 MIME 解析器?
我正在制作一个透明代理,需要透明地处理和修改许多野外标头字段,这就是为什么我需要如此多的格式细节。
What confuses me is decoding of HTTP header values.
Example Header:Some-Header: "quoted string?"; *utf-8'en'Weirdness
Can header value's be quoted? What about the encoding of a "
itself? is '
a valid quote character? What's the significance of a semi-colon (;
)? Could the value parser for a HTTP header be considered a MIME parser?
I am making a transparent proxy that needs to transparently handle and modify many in-the-wild header fields. That's why I need so much detail on the format.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您的意思是 RFC 5987
parameter
生成是否适用于标头值的主要部分,那么不行。这里标头值的主要部分可能是
"foo"
包括引号,但是......具体处理是为每个命名标头单独定义的。因此,分号对于
Content-Disposition
很重要,但对于Content-Length
则不然。显然这不是一个非常令人满意的解决方案,但这就是我们所坚持的。
您无法以通用方式处理这些,您必须知道每个可能的标头的形式。对于您不认识的任何内容,不要尝试分解标头值;实际上,目前支持 RFC 5987 的东西很少,您不太可能对其进行很多有用的处理。
目前的现状是标头值中的非 ASCII 字符根本无法很好地跨浏览器使用,无论是编码的还是原始的。
幸运的是,很少需要它们。唯一真正常见的用例是
Content-Disposition
的非 ASCII 文件名,但通过将文件名放在尾随 URL 路径部分来解决这个问题更容易。不。HTTP 大量借鉴了 MIME 和 RFC 822 系列标准,但它不是 822 系列的一部分。它有自己的低级标头语法,看起来像 822,但不太兼容。任意 MIME 功能不能在 HTTP 中使用,必须有一个标准化机制将它们显式拖入 HTTP,这就是 RFC 5987,用于 RFC 2231(部分)。
(请参阅 RFC 2616 的第 19.4 节以了解有关一些其他差异。)
理论上,
multipart
表单提交是 822 系列的一部分,您应该能够使用有 RFC 2231 编码。但现实是浏览器也不支持这一点。If you mean does the RFC 5987
parameter
production apply to the main part of the header value, then no.Here the main part of the header value would probably be
"foo"
including the quotes, but...The specific handling is defined for each named header separately. So semicolon is significant for, say,
Content-Disposition
, but not forContent-Length
.Obviously this is not a very satisfactory solution but that's what we're stuck with.
You can't handle these in a generic way, you have to know the form of each possible header. For anything you don't recognise, don't attempt to decompose the header value; and really, so little out there supports RFC 5987 at the moment, it's unlikely you'll be able to do much useful handling of it.
Status quo today is that non-ASCII characters in header values doesn't work well enough cross-browser to be used at all, either encoded or raw.
Luckily they are rarely needed. The only really common use case is non-ASCII filenames for
Content-Disposition
but that's easier to work around by putting the filename in a trailing URL path part instead.No. HTTP borrows heavily from MIME and the RFC 822 family of standards in general, but it isn't part of the 822 family. It has its own low-level grammar for headers which looks like 822, but isn't quite compatible. Arbitrary MIME features can't be used in HTTP, there has to be a standardisation mechanism to drag them into HTTP explicitly—which is what RFC 5987 is, for (parts of) RFC 2231.
(See section 19.4 of RFC 2616 for discussion of some other differences.)
In theory, a
multipart
form submission is part of the 822 family and you should be able to use RFC 2231 encoding there. But the reality is browsers don't support that either.