HTTP协议使用哪种编码?

发布于 2024-07-20 01:06:45 字数 251 浏览 7 评论 0原文

当浏览器向 Web 服务器发送 HTTP 请求时,使用什么编码来对线路上的 HTTP 协议进行编码? 是ASCII吗? UTF8? 还是UTF16? 或者它是否指定以预定义格式使用哪种编码(在进行任何解码之前?)

PS 我不是在询问请求/响应的实际有效负载(例如 HTML)。 我询问请求行(即 GET /index.html HTTP/1.1)和标头(即 Host: google.com

When a browser sends an HTTP request to a web server, what encoding is used to encode the HTTP protocol on the wire? Is it ASCII? UTF8? or UTF16? Or does it specify which encoding it uses in a predefined format (before any decoding takes place?)

P.S
I'm not asking about the actual payload (e.g. HTML) of the request/response. I'm asking about the request line (i.e. GET /index.html HTTP/1.1) and headers (i.e. Host: google.com)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

听,心雨的声音 2024-07-27 01:06:46

RFC 2616 包括以下内容:

OCTET          = <any 8-bit sequence of data>
CHAR           = <any US-ASCII character (octets 0 - 127)>
UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
LOALPHA        = <any US-ASCII lowercase letter "a".."z">
ALPHA          = UPALPHA | LOALPHA
DIGIT          = <any US-ASCII digit "0".."9">
CTL            = <any US-ASCII control character
                  (octets 0 - 31) and DEL (127)>
CR             = <US-ASCII CR, carriage return (13)>
LF             = <US-ASCII LF, linefeed (10)>
SP             = <US-ASCII SP, space (32)>
HT             = <US-ASCII HT, horizontal-tab (9)>
<">            = <US-ASCII double-quote mark (34)>

然后文档中的几乎所有其他内容都是根据以下内容定义的这些实体(OCTETCHAR 等)。 因此,您可以查看 RFC 以找出 HTTP 请求/响应的哪些部分可以包含 OCTET; 所有其他部分都必须是 ASCII。 (我会自己做,但这需要很长时间)

具体来说,对于请求行,方法名称和 HTTP 版本将仅为 ASCII 字符,但 URL 本身可能包含非 ASCII 字符。 但是如果你查看RFC 2396,它就是这么说的。

URI 是来自非常有限的集合的字符序列,即基本拉丁字母表中的字母、数字和一些特殊字符。

我猜这意味着它也将由 ASCII 字符组成。

RFC 2616 includes this:

OCTET          = <any 8-bit sequence of data>
CHAR           = <any US-ASCII character (octets 0 - 127)>
UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
LOALPHA        = <any US-ASCII lowercase letter "a".."z">
ALPHA          = UPALPHA | LOALPHA
DIGIT          = <any US-ASCII digit "0".."9">
CTL            = <any US-ASCII control character
                  (octets 0 - 31) and DEL (127)>
CR             = <US-ASCII CR, carriage return (13)>
LF             = <US-ASCII LF, linefeed (10)>
SP             = <US-ASCII SP, space (32)>
HT             = <US-ASCII HT, horizontal-tab (9)>
<">            = <US-ASCII double-quote mark (34)>

And then pretty much everything else in the document is defined in terms of those entities (OCTET, CHAR, etc.). So you could look through the RFC to find out which parts of an HTTP request/response can include OCTETs; all other parts must be ASCII. (I'd do it myself, but it'd take a long time)

For the request line specifically, the method name and HTTP version are going to be ASCII characters only, but it's possible that the URL itself could include non-ASCII characters. But if you look at RFC 2396, it says that.

A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.

Which I guess means that it'll consist of ASCII characters as well.

随波逐流 2024-07-27 01:06:46

HTTP 1.1 使用 US-ASCII 作为请求行<的基本字符集<请求中的 /a> 响应中的状态行原因短语除外)和字段名称,但允许字段值中包含任何八位字节,并且 消息正文

HTTP 1.1 uses US-ASCII as basic character set for the request line in requests, the status line in responses (except the reason phrase) and the field names but allows any octet in the field values and the message body.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文