为什么 mod_perl 应用程序中的 UTF8 数据在 Web 浏览器中仍然出现乱码?
在开始之前,我想强调一下我正在处理的内容的结构。
- 有一个文本文件,从中获取特定文本。该文件采用 utf-8 编码
- Perl 获取该文件并将其打印到页面中。一切都按其应有的样子显示。 Perl 设置为使用 utf-8
- Perl 生成的网页具有以下标头
.因此它是 utf-8
- 第一次加载后,所有内容都通过 jQuery/AJAX 动态加载。通过翻阅页面,可以加载完全相同的文本,只不过这次是由 JavaScript 加载的。请求具有以下标头
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
- 在后端处理 AJAX 请求的 Perl 处理程序以 utf-8 传递内容。AJAX
- 处理程序调用我们自定义框架中的一个函数。在框架打印出文本之前,它正确显示为“üöä”。发送到 AJAX 处理程序后,它会读取“x{c3}\x{b6}\x{c3}\x{a4}\x{c3}\x{bc}”,这是“的 utf-8 表示” üöä”。
- AJAX 处理程序将其包以 JSON 形式传递给客户端后,网页会打印以下内容:“????”。
- JS 和 Perl 文件本身保存为 utf-8(Eclipse 中的默认设置)
这些是症状。我尝试了谷歌告诉我的一切,但问题仍然存在。有谁知道它可能是什么?如果您需要任何特定的代码片段,请告诉我,我会尝试粘贴它。
编辑 1
来自 AJAX 处理程序
Date: Mon, 09 Nov 2009 11:40:27 GMT
Server: Apache/2.2.10 (Linux/SUSE)
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset="utf-8"
200 OK
答案
的响应标头在各位和 this 的帮助下 页面,我能够找到问题所在。似乎问题不是编码本身,而是 Perl 将我的变量 $text 编码为 utf-8 两次(根据该网站)。解决方案就像添加 Encode::decode_utf8() 一样简单。
我一开始就在完全错误的地方寻找。我感谢所有帮助我在正确的地方搜索的人:)
#spreads some upvote love#
Before I begin, I would like to highlight the structure of what I am working with.
- There is a text file from which a specific text is taken. The file is encoded in utf-8
- Perl takes the file and prints it into a page. Everything is displayed as it should be. Perl is set to use utf-8
- The web page Perl generates has the following header
<meta content="text/html;charset=utf-8" http-equiv="content-type"/>
. Hence it is utf-8 - After the first load, everything is loaded dynamically via jQuery/AJAX. By flipping through pages, it is possible to load the exact same text, only this time it is loaded by JavaScript. The Request has following header
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
- The Perl handler which processes the AJAX Request on the Backend delivers contents in utf-8
- The AJAX Handler calls up a function in our custom Framework. Before the Framework prints out the text, it is displayed correctly as "üöä". After being sent to the AJAX Handler, it reads "x{c3}\x{b6}\x{c3}\x{a4}\x{c3}\x{bc}" which is the utf-8 representation of "üöä".
- After the AJAX Handler delivers its package to the client as JSON, the webpage prints the following: "öäü".
- The JS and Perl files themselves are saved in utf-8 (default setting in Eclipse)
These are the symptoms. I tried everything Google told me and I still have the problem. Does anyone have a clue what it could be? If you need any specific code snippet, tell me so and I'll try to paste it.
Edit 1
The Response Header from the AJAX Handler
Date: Mon, 09 Nov 2009 11:40:27 GMT
Server: Apache/2.2.10 (Linux/SUSE)
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset="utf-8"
200 OK
Answer
With the help of you folks and this page, I was able to track down the problem. Seems like the problem was not the encoding by itself, but rather Perl encoding my variable $text twice as utf-8 (according to the site). The solution was as simple as adding Encode::decode_utf8().
I was searching in the completely wrong place to begin with. I thank you all who helped me search in the right place :)
#spreads some upvote love#
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
那是:
这表示您的 AJAX 处理程序正在使用 HTML 实体编码函数进行输出,即假设输入来自 ISO-8859-1 字符集。您可以使用了解 UTF-8 的字符引用编码器,但仅对潜在特殊字符
<>&"'
进行编码可能会更容易,而无需对其他字符进行编码。MIME 类型
application/x-www-form-urlencoded
没有charset
这样的参数。这将被忽略。表单编码的字符串本质上是基于字节的;由应用程序决定将它们视为什么字符集(如果有的话;也许应用程序只需要字节)。That's:
Which says your AJAX handler is using an HTML-entity-encoding function for its output, that is assuming input from the ISO-8859-1 character set. You could use a character-reference encoder that knew about UTF-8 instead, but probably it will be easier just to encode the potentially-special characters
<>&"'
and no others.There is no such parameter as
charset
for the MIME typeapplication/x-www-form-urlencoded
. This will be ignored. Form-encoded strings are inherently byte-based; it is up to the application to decide what character set they are treated as (if any; maybe the application does just want bytes).这与其说是一个答案,不如说是一个调试建议。我首先想到的是尝试发送 HTML 实体,例如
Ӓ
而不是 utf-8 代码。为了让 Perl 发送这些信息,肯定有一个模块,或者你可以这样做在我看来,这个问题最有可能的原因是 JavaScript 接收端无法理解 Perl 编码的 UTF-8。
This isn't an answer so much as a suggestion for debugging. The first thing that springs to mind is to try sending HTML entities like
Ӓ
instead of utf-8 codes. To make Perl send these there is surely a module or you can just doThe thing which it seems to me the most likely cause of this problem is that the JavaScript receiving end and is not able to understand the encoded UTF-8 from Perl.