为什么 mod_perl 应用程序中的 UTF8 数据在 Web 浏览器中仍然出现乱码?

发布于 2024-08-10 18:15:13 字数 1391 浏览 2 评论 0原文

在开始之前,我想强调一下我正在处理的内容的结构。

  1. 有一个文本文件,从中获取特定文本。该文件采用 utf-8 编码
  2. Perl 获取该文件并将其打印到页面中。一切都按其应有的样子显示。 Perl 设置为使用 utf-8
  3. Perl 生成的网页具有以下标头 .因此它是 utf-8
  4. 第一次加载后,所有内容都通过 jQuery/AJAX 动态加载。通过翻阅页面,可以加载完全相同的文本,只不过这次是由 JavaScript 加载的。请求具有以下标头 Content-Type: application/x-www-form-urlencoded; charset=UTF-8
  5. 在后端处理 AJAX 请求的 Perl 处理程序以 utf-8 传递内容。AJAX
  6. 处理程序调用我们自定义框架中的一个函数。在框架打印出文本之前,它正确显示为“üöä”。发送到 AJAX 处理程序后,它会读取“x{c3}\x{b6}\x{c3}\x{a4}\x{c3}\x{bc}”,这是“的 utf-8 表示” üöä”。
  7. AJAX 处理程序将其包以 JSON 形式传递给客户端后,网页会打印以下内容:“????”。
  8. JS 和 Perl 文件本身保存为 utf-8(Eclipse 中的默认设置)

这些是症状。我尝试了谷歌告诉我的一切,但问题仍然存在。有谁知道它可能是什么?如果您需要任何特定的代码片段,请告诉我,我会尝试粘贴它。

编辑 1

来自 AJAX 处理程序

Date: Mon, 09 Nov 2009 11:40:27 GMT
Server: Apache/2.2.10 (Linux/SUSE)
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset="utf-8"

200 OK

答案

的响应标头在各位和 this 的帮助下 页面,我能够找到问题所在。似乎问题不是编码本身,而是 Perl 将我的变量 $text 编码为 utf-8 两次(根据该网站)。解决方案就像添加 Encode::decode_utf8() 一样简单。

我一开始就在完全错误的地方寻找。我感谢所有帮助我在正确的地方搜索的人:)

#spreads some upvote love#

Before I begin, I would like to highlight the structure of what I am working with.

  1. There is a text file from which a specific text is taken. The file is encoded in utf-8
  2. Perl takes the file and prints it into a page. Everything is displayed as it should be. Perl is set to use utf-8
  3. The web page Perl generates has the following header <meta content="text/html;charset=utf-8" http-equiv="content-type"/>. Hence it is utf-8
  4. After the first load, everything is loaded dynamically via jQuery/AJAX. By flipping through pages, it is possible to load the exact same text, only this time it is loaded by JavaScript. The Request has following header Content-Type: application/x-www-form-urlencoded; charset=UTF-8
  5. The Perl handler which processes the AJAX Request on the Backend delivers contents in utf-8
  6. The AJAX Handler calls up a function in our custom Framework. Before the Framework prints out the text, it is displayed correctly as "üöä". After being sent to the AJAX Handler, it reads "x{c3}\x{b6}\x{c3}\x{a4}\x{c3}\x{bc}" which is the utf-8 representation of "üöä".
  7. After the AJAX Handler delivers its package to the client as JSON, the webpage prints the following: "öäü".
  8. The JS and Perl files themselves are saved in utf-8 (default setting in Eclipse)

These are the symptoms. I tried everything Google told me and I still have the problem. Does anyone have a clue what it could be? If you need any specific code snippet, tell me so and I'll try to paste it.

Edit 1

The Response Header from the AJAX Handler

Date: Mon, 09 Nov 2009 11:40:27 GMT
Server: Apache/2.2.10 (Linux/SUSE)
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset="utf-8"

200 OK

Answer

With the help of you folks and this page, I was able to track down the problem. Seems like the problem was not the encoding by itself, but rather Perl encoding my variable $text twice as utf-8 (according to the site). The solution was as simple as adding Encode::decode_utf8().

I was searching in the completely wrong place to begin with. I thank you all who helped me search in the right place :)

#spreads some upvote love#

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

谈下烟灰 2024-08-17 18:15:13

返回以下内容:&38;&65;&116;&105;&108;&100;&101;&59;&38;&112;&97 ;&114;&97;&59;...

那是:

öäü

这表示您的 AJAX 处理程序正在使用 HTML 实体编码函数进行输出,即假设输入来自 ISO-8859-1 字符集。您可以使用了解 UTF-8 的字符引用编码器,但仅对潜在特殊字符 <>&"' 进行编码可能会更容易,而无需对其他字符进行编码。

请求具有以下标头内容类型:application/x-www-form-urlencoded;字符集=UTF-8

MIME 类型 application/x-www-form-urlencoded 没有 charset 这样的参数。这将被忽略。表单编码的字符串本质上是基于字节的;由应用程序决定将它们视为什么字符集(如果有的话;也许应用程序只需要字节)。

returns the following: &38;&65;&116;&105;&108;&100;&101;&59;&38;&112;&97;&114;&97;&59;...

That's:

öäü

Which says your AJAX handler is using an HTML-entity-encoding function for its output, that is assuming input from the ISO-8859-1 character set. You could use a character-reference encoder that knew about UTF-8 instead, but probably it will be easier just to encode the potentially-special characters <>&"' and no others.

The Request has following header Content-Type: application/x-www-form-urlencoded; charset=UTF-8

There is no such parameter as charset for the MIME type application/x-www-form-urlencoded. This will be ignored. Form-encoded strings are inherently byte-based; it is up to the application to decide what character set they are treated as (if any; maybe the application does just want bytes).

2024-08-17 18:15:13

这与其说是一个答案,不如说是一个调试建议。我首先想到的是尝试发送 HTML 实体,例如 Ӓ 而不是 utf-8 代码。为了让 Perl 发送这些信息,肯定有一个模块,或者你可以这样做

 my $text =~ s/(.)/"&#" . ord ($1) . ";"/ge;

在我看来,这个问题最有可能的原因是 JavaScript 接收端无法理解 Perl 编码的 UTF-8。

This isn't an answer so much as a suggestion for debugging. The first thing that springs to mind is to try sending HTML entities like Ӓ instead of utf-8 codes. To make Perl send these there is surely a module or you can just do

 my $text =~ s/(.)/"&#" . ord ($1) . ";"/ge;

The thing which it seems to me the most likely cause of this problem is that the JavaScript receiving end and is not able to understand the encoded UTF-8 from Perl.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文