POST 数据中的 node.js 和 utf-8

发布于 2024-12-10 06:55:59 字数 1576 浏览 0 评论 0 原文

使用 Node.JS Web 服务器时,我在解码 POST 数据中的 UTF-8 字符串时遇到问题。

查看这个完整的测试用例:

require("http").createServer(function(request, response) {

  if (request.method != "POST") {

    response.writeHead(200, {'Content-Type': 'text/html; charset=utf-8'});
    response.end('<html>'+
      '<head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head>'+
      '<body>'+
      '<form method="post">'+
      '<input name="test" value="Grüße!"><input type="submit">'+
      '</form></body></html>');

  } else {

    console.log("CONTENT TYPE=",request.headers['content-type']);

    var body="";
    request.on('data', function (data) {
      body += data;
    });

    request.on('end', function () {
      console.log("POST BODY=",body);

      response.writeHead(200, {'Content-Type': 'text/plain; charset=utf-8'});
      response.end("POST DATA:\n"+body+"\n---\nUNESCAPED:\n"+unescape(body)+
        "\n---\nHARDCODED: Grüße!");
    });

  }

}).listen(11180);

这是一个独立的 Web 服务器,它侦听端口 11180 并发送一个带有简单表单的 HTML 页面,其中包含带有特殊字符的输入字段。将该表单发布到服务器将以纯文本响应形式回显其内容。

我的问题是特殊字符在控制台和浏览器中都没有正确显示。这是我在 FireFox 和 IE 中看到的情况:

POST DATA:
test=Gr%C3%BC%C3%9Fe%21
---
UNESCAPED:
test=GrüÃe!
---
HARDCODED: Grüße!

最后一行是一个硬编码字符串 Grüße! ,它应该与输入字段的值匹配(以验证它不是显示问题)。显然 POST 数据不会被解释为 UTF-8。当使用 require('querystring') 将数据分解为字段时,也会出现同样的问题。

有什么线索吗?

在 Debian Linux 4 上使用 Node.JS v0.4.11,源代码以 utf-8 字符集保存

I am having problems decoding UTF-8 strings in POST data when using the Node.JS web server.

See this complete testcase:

require("http").createServer(function(request, response) {

  if (request.method != "POST") {

    response.writeHead(200, {'Content-Type': 'text/html; charset=utf-8'});
    response.end('<html>'+
      '<head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head>'+
      '<body>'+
      '<form method="post">'+
      '<input name="test" value="Grüße!"><input type="submit">'+
      '</form></body></html>');

  } else {

    console.log("CONTENT TYPE=",request.headers['content-type']);

    var body="";
    request.on('data', function (data) {
      body += data;
    });

    request.on('end', function () {
      console.log("POST BODY=",body);

      response.writeHead(200, {'Content-Type': 'text/plain; charset=utf-8'});
      response.end("POST DATA:\n"+body+"\n---\nUNESCAPED:\n"+unescape(body)+
        "\n---\nHARDCODED: Grüße!");
    });

  }

}).listen(11180);

This is a standalone web server that listens on port 11180 and sends a HTML page with a simple form that contains an input field with special characters. POSTing that form to the server will echo it's contents in a plain text response.

My problem is that the special charactes are not being displayed properly neither on the console nor in the browser. This is what I see with both FireFox and IE:

POST DATA:
test=Gr%C3%BC%C3%9Fe%21
---
UNESCAPED:
test=GrüÃe!
---
HARDCODED: Grüße!

The last line is a hardcoded string Grüße! that should match the value of the input field (as to verify that it's not a displaying problem). Obviously the POST data is not interpreted as UTF-8. The same problem happens when using require('querystring') to break the data into fields.

Any clue?

Using Node.JS v0.4.11 on Debian Linux 4, source code is saved in utf-8 charset

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

旧夏天 2024-12-17 06:55:59

üß UTF-8 字符在 ascii 字符集中找不到,并且由多个 ascii 字符表示。

根据 http://www.w3.org/TR/html4/interact/ forms.html#h-17.13.4.1

内容类型“application/x-www-form-urlencoded”效率低下
用于发送大量二进制数据或包含以下内容的文本
非 ASCII 字符
。内容类型“multipart/form-data”应该是
用于提交包含文件、非 ASCII 数据和
二进制数据。

将表单上的 enctype 切换为多部分

将正确将文本呈现为 UTF-8 字符。然后您必须解析多部分格式。 node-formidable 似乎是最流行的库。

正如您在评论中提到的,使用 decodeURIComponent() 可能要简单得多。 Unescape 不处理多字节字符,而是将每个字节表示为其自己的字符,因此您会看到乱码。 http://xkr.us/articles/javascript/encode-compare/

您还可以使用缓冲区来更改编码。在这种情况下就太过分了,但如果你需要:

new Buffer(myString, 'ascii').toString('utf8');

The üß UTF-8 characters are not found in the ascii charset, and are being represented by multiple ascii characters.

According to http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

The content type "application/x-www-form-urlencoded" is inefficient
for sending large quantities of binary data or text containing
non-ASCII characters
. The content type "multipart/form-data" should be
used for submitting forms that contain files, non-ASCII data, and
binary data.

Switching your enctype on the form to multipart <form method="post" enctype="multipart/form-data />" will correctly render the text as the UTF-8 characters. You then have to parse the multipart format. node-formidable seems to be the most popular lib for doing so.

It's probably much simpler to use decodeURIComponent() as you mentioned in a comment. Unescape does not handle multibyte characters, and instead represents each byte as its own character, hence the garbling you're seeing. http://xkr.us/articles/javascript/encode-compare/

You can also use buffers to change the encoding. Overkill in this case, but if you needed to:

new Buffer(myString, 'ascii').toString('utf8');
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文