提交的字符编码——_charset_隐藏字段

发布于 2024-09-08 01:10:34 字数 1079 浏览 8 评论 0原文

对于我们的 Web 应用程序,我们有多个包含文本区域的 HTML 页面。我们所有的页面均使用 ISO-8859-1 字符集呈现。当在 Windows 计算机上通过 IE6 访问页面并将特殊字符(例如“智能引用”)复制到文本区域时,我们的某些页面会使用 Windows 1252 字符编码提交页面。在其他页面上,页面似乎使用 UTF-8 字符编码提交。我一直在使用以下隐藏字段跟踪提交字符编码:

<input type="hidden" name="_charset_" />

在 Windows 1252 提交字符编码页面上,我们收到“windows-1252”值。

在 UTF-8 提交字符编码页面上,我们收到一个空白值。

在后端,我们使用 ISO-8859-1。虽然理想情况下我们希望提交字符编码,但我没有看到在 IE 6 上强制执行该行为的选项。考虑到 Windows 1252 和 UTF-8 之间的选择,我更喜欢在 Windows 1252 中提交内容,这样更有可能当页面以 ISO-8859-1 重新呈现时正确呈现。

我对我们的页面进行了一定的深入研究,但我并没有意识到某些页面以一种字符编码提交的原因。

1) 当 IE 6 返回空白的字符集时,这实际上等同于 UTF-8 吗?当提交字符编码为 UTF-8 时,或者仅当无法正确确定要使用的字符编码时,IE 6 是否始终返回空白字符集?

2) 页面上可能存在哪些差异,导致 IE 6 在某些页面上选择 Windows 1252,而在其他页面上选择 UTF-8?我扫描了页面中的 UTF-8 字符和任何接受字符集属性,但都找不到。

附加说明:我在以下链接中找到了有关隐藏输入的字符集的信息。

http:// /web.archive.org/web/20060427015200/ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html

For our web app, we have multiple HTML pages containing text areas. All of our pages are rendered with an ISO-8859-1 charset. When the page is accessed through IE6 on a Windows machine and special characters such as a "smart quote" are copied in to the text area, some of our pages submit the page using the Windows 1252 character encoding. On the others, the pages appear to submit using the UTF-8 character encoding. I've been tracking the submit character encoding by using the following hidden field:

<input type="hidden" name="_charset_" />

On the Windows 1252 submit character encoding pages, we receive a value of "windows-1252".

On the UTF-8 submit character encoding pages, we receive a blank value.

On the backend, we are using ISO-8859-1. While ideally we would want the submit character encoding, I do not see an option for forcing that behavior on IE 6. Given the choice between Windows 1252 and UTF-8, I would prefer the content be submitted in Windows 1252 so that is more likely to render correctly when the page re-renders in ISO-8859-1.

I've looked into our pages in some depth and nothing jumps out at me as the reason why some pages submit in one character encoding.

1) When IE 6 returns a charset of blank, does that in fact equate to UTF-8? Does IE 6 always return a charset of blank when the submit character encoding is UTF-8, or only when it is unable to properly determine what character encoding to use?

2) What possible differences could there be on the pages that would result in IE 6 picking Windows 1252 on some pages and UTF-8 on others? I scanned the page for UTF-8 characters and for any accept-charset attributes and could not find either.

Additional Note: I found the information on the charset hidden input at the following link.

http://web.archive.org/web/20060427015200/ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

自由范儿 2024-09-15 01:10:34

MSDN 声明 IE 只接受“utf-8”作为该属性的值。

MSDN states that IE only accepts "utf-8" as a value for this attribute.

烏雲後面有陽光 2024-09-15 01:10:34

名为 _charset_ 的隐藏字段经过 符合 HTML5 的 客户端:

[...]名称​​字符集的 ASCII 不区分大小写的匹配是特殊的:如果
用作没有 value 属性的隐藏控件的名称,然后
在提交过程中,value 属性会自动被赋予一个值
由提交的字符编码组成。

提交字符编码根据 以下算法

如果用户代理要为表单选择一种编码,它
必须运行以下步骤:

  1. 令编码为文档的字符编码。

  2. 如果表单元素具有accept-charset属性,则将编码设置为
    运行这些子步骤的返回值:

    1. 让 input 为表单元素的accept-charset 属性的值。

    2. 让候选编码标签成为分割输入的结果
      ASCII 空白。

    3. 让候选编码为空的字符编码列表。

    4. 对于候选编码标签中的每个标记依次(按顺序)
      它们是在输入中找到的),获取令牌的编码,如果
      这不会导致失败,将编码附加到候选者
      编码。

    5. 如果候选编码为空,则返回 UTF-8。

    6. 返回候选编码中的第一个编码。

  3. 返回encoding获取输出编码的结果。

所以我认为如果你在后端没有收到 _charset_ 表单参数,你应该假设字符编码是 UTF-8

The hidden field named _charset_ has special treatement by HTML5 conforming clients:

[...]An ASCII case-insensitive match for the name charset is special: if
used as the name of a Hidden control with no value attribute, then
during submission the value attribute is automatically given a value
consisting of the submission character encoding.

The submission character encoding is selected according to the following algorithm:

If the user agent is to pick an encoding for a form, it
must run the following steps:

  1. Let encoding be the document's character encoding.

  2. If the form element has an accept-charset attribute, set encoding to
    the return value of running these substeps:

    1. Let input be the value of the form element's accept-charset attribute.

    2. Let candidate encoding labels be the result of splitting input on
      ASCII whitespace.

    3. Let candidate encodings be an empty list of character encodings.

    4. For each token in candidate encoding labels in turn (in the order in
      which they were found in input), get an encoding for the token and, if
      this does not result in failure, append the encoding to candidate
      encodings.

    5. If candidate encodings is empty, return UTF-8.

    6. Return the first encoding in candidate encodings.

  3. Return the result of getting an output encoding from encoding.

So I think that if you do not receive a _charset_ form parameter at backend, you should assume the character encoding is UTF-8

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文