解析 HTTP 标头值:引用、RFC 5987、MIME 等

发布于 2024-12-22 16:20:55 字数 347 浏览 5 评论 0原文

令我困惑的是 HTTP 标头的解码。

标头示例:
Some-Header: "带引号的字符串?"; *utf-8'en'Weirdness

标题可以被引用吗? " 本身的编码怎么样?' 是一个有效的引号字符吗?分号 (;) 的意义是什么? HTTP 标头的值解析器是否被视为 MIME 解析器?

我正在制作一个透明代理,需要透明地处理和修改许多野外标头字段,这就是为什么我需要如此多的格式细节。

What confuses me is decoding of HTTP header values.

Example Header:
Some-Header: "quoted string?"; *utf-8'en'Weirdness

Can header value's be quoted? What about the encoding of a " itself? is ' a valid quote character? What's the significance of a semi-colon (;)? Could the value parser for a HTTP header be considered a MIME parser?

I am making a transparent proxy that needs to transparently handle and modify many in-the-wild header fields. That's why I need so much detail on the format.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

等风来 2024-12-29 16:20:55

标题值可以加引号吗?

如果您的意思是 RFC 5987 parameter 生成是否适用于标头值的主要部分,那么不行。

Some-Header: "foo"; bar*=utf-8'en'bof

这里标头值的主要部分可能是 "foo" 包括引号,但是......

分号 (;) 的意义是什么?

具体处理是为每个命名标头单独定义的。因此,分号对于 Content-Disposition 很重要,但对于 Content-Length 则不然。

显然这不是一个非常令人满意的解决方案,但这就是我们所坚持的。

我正在制作一个透明代理,需要透明地处理和修改许多野外标头字段。

您无法以通用方式处理这些,您必须知道每个可能的标头的形式。对于您不认识的任何内容,不要尝试分解标头值;实际上,目前支持 RFC 5987 的东西很少,您不太可能对其进行很多有用的处理。

目前的现状是标头值中的非 ASCII 字符根本无法很好地跨浏览器使用,无论是编码的还是原始的。

幸运的是,很少需要它们。唯一真正常见的用例是 Content-Disposition 的非 ASCII 文件名,但通过将文件名放在尾随 URL 路径部分来解决这个问题更容易。

HTTP 标头的值解析器可以被视为 MIME 解析器吗?

不。HTTP 大量借鉴了 MIME 和 RFC 822 系列标准,但它不是 822 系列的一部分。它有自己的低级标头语法,看起来像 822,但不太兼容。任意 MIME 功能不能在 HTTP 中使用,必须有一个标准化机制将它们显式拖入 HTTP,这就是 RFC 5987,用于 RFC 2231(部分)。

(请参阅 RFC 2616 的第 19.4 节以了解有关一些其他差异。)

理论上,multipart 表单提交 822 系列的一部分,您应该能够使用有 RFC 2231 编码。但现实是浏览器也不支持这一点。

Can header values be quoted?

If you mean does the RFC 5987 parameter production apply to the main part of the header value, then no.

Some-Header: "foo"; bar*=utf-8'en'bof

Here the main part of the header value would probably be "foo" including the quotes, but...

What's the significance of a semi-colon (;)?

The specific handling is defined for each named header separately. So semicolon is significant for, say, Content-Disposition, but not for Content-Length.

Obviously this is not a very satisfactory solution but that's what we're stuck with.

I am making a transparent proxy that needs to transparently handle and modify many in-the-wild header fields.

You can't handle these in a generic way, you have to know the form of each possible header. For anything you don't recognise, don't attempt to decompose the header value; and really, so little out there supports RFC 5987 at the moment, it's unlikely you'll be able to do much useful handling of it.

Status quo today is that non-ASCII characters in header values doesn't work well enough cross-browser to be used at all, either encoded or raw.

Luckily they are rarely needed. The only really common use case is non-ASCII filenames for Content-Disposition but that's easier to work around by putting the filename in a trailing URL path part instead.

Could the value parser for a HTTP header be considered a MIME parser?

No. HTTP borrows heavily from MIME and the RFC 822 family of standards in general, but it isn't part of the 822 family. It has its own low-level grammar for headers which looks like 822, but isn't quite compatible. Arbitrary MIME features can't be used in HTTP, there has to be a standardisation mechanism to drag them into HTTP explicitly—which is what RFC 5987 is, for (parts of) RFC 2231.

(See section 19.4 of RFC 2616 for discussion of some other differences.)

In theory, a multipart form submission is part of the 822 family and you should be able to use RFC 2231 encoding there. But the reality is browsers don't support that either.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文