无法解析和显示从 http 请求读取的非 utf8 字符
我正在使用 Java 解析此请求
结果是这个(为了简洁而被截断)JSON 文件:
{"responseData":{"results":
<...>
"visibleUrl":"www.coolcook.net",
"cacheUrl":"http://www.google.com/search?q\u003dcache:p4Ke5q6zpnUJ:www.coolcook.net",
"title":"مطبخ مطايب - كباب الدجاج والخضار بصلصة الروب",
"titleNoFormatting":"مطبخ مطايب - كباب الدجاج والخضار بصلصة الروب","\u003drz+img+news+recordid+border"}},
<...>
"responseDetails": null, "responseStatus": 200}
我的问题在于返回的阿拉伯字符(可能是任何非 unicode 的字符)。我尝试使用类似以下内容将它们转换回 unicode:
JSONArray ja = json.getJSONObject("responseData").getJSONArray("results");
JSONObject j = ja.getJSONObject(i);
str = j.getString("titleNoFormatting");
logger.log("before: " + str); // this is just my version of println
enc_str = new String (str.getBytes(), "UTF8");
logger.log("after: " + enc_str);
但是,“之前”和“之后”结果是相同的:一组 ???? 的,无论我是否将它们输出到服务器日志文件或在 HTML 页面中。还有另一种方法可以取回阿拉伯字符并将其输出到网页中吗?
JSON 是否有针对此类问题的任何支持功能,以便直接从 JSONObject 读取非 utf 字符?
I'm using Java to parse this request
which has as a result this (truncated for the sake of brevity) JSON file:
{"responseData":{"results":
<...>
"visibleUrl":"www.coolcook.net",
"cacheUrl":"http://www.google.com/search?q\u003dcache:p4Ke5q6zpnUJ:www.coolcook.net",
"title":"مطبخ مطايب - كباب الدجاج والخضار بصلصة الروب",
"titleNoFormatting":"مطبخ مطايب - كباب الدجاج والخضار بصلصة الروب","\u003drz+img+news+recordid+border"}},
<...>
"responseDetails": null, "responseStatus": 200}
My problem lies in the arabic characters returned (which could be any non-unicode for that matter). I tried to convert them back to unicode using something like:
JSONArray ja = json.getJSONObject("responseData").getJSONArray("results");
JSONObject j = ja.getJSONObject(i);
str = j.getString("titleNoFormatting");
logger.log("before: " + str); // this is just my version of println
enc_str = new String (str.getBytes(), "UTF8");
logger.log("after: " + enc_str);
However, both the 'before' and 'after' results are the same: a set of ????'s, regardless of whether I output them in the server log file or in an HTML page. Is there another way to get back the arabic characters and output them in a webpage?
Does JSON have any supporting functionality for this sort of problem perhaps in order to read the non-utf characters straight away from the JSONObject?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您遇到的问题很可能是由于您在 google 的 http 响应中读取时字符编码设置不正确造成的。您能发布实际获取 URL 并将其解析为 JSON 对象的代码吗?
作为示例,运行以下命令:
这使用了自诞生以来就存在的相当丑陋的标准
URL.openConnection()
。如果您使用的是 Apache httpclient 之类的东西,那么您可以非常轻松地做到这一点。有关编码的一些背景阅读,或许还可以解释为什么
new String (str.getBytes(), "UTF8");
永远不会工作,请阅读 Joel 关于 unicode 的文章The issue you have is most likely caused by incorrect setting of the character encoding at the point that you are reading in the http response from google. Can you post the code that actually gets URL and parses it into the JSON object?
As an example run the following:
This is using the rather ugly standard
URL.openConnection()
that's been around since the dawn of time. If you are using something like Apache httpclient then you can do this really easily.For a bit of back ground reading on encoding and maybe an explaination of why
new String (str.getBytes(), "UTF8");
will never work read Joel's article on unicode我认为 JSON.org Java JSON 包无法处理 UTF8,无论是作为 UTF8 字符传入还是实际传入
\uXXXX
代码。我尝试了以下两种方法:我得到:
有什么想法吗?
I think the JSON.org Java JSON package cannot handle UTF8, whether it is passed in as a UTF8 character or actually passing in the
\uXXXX
code. I tried both as follows:I get:
Any ideas?
问题的重要部分是如何处理 HTTP 响应的内容。也就是说,您如何创建
json
对象?当您看到原始帖子中的代码时,内容已经损坏。该请求产生 UTF-8 编码的数据。你如何将它解析为 JSON 对象?是否为解码器指定了正确的编码?或者您的平台是否使用默认的字符编码?
The important part of the problem is how you are handling the content of the HTTP response. That is, how are you creating the
json
object? By the time you get to the code in your original post, the content has already been corrupted.The request results in UTF-8 encoded data. How are you parsing it into JSON objects? Is the correct encoding specified to the decoder? Or is your platform's default character encoding being used?
首先尝试以下操作:
然后在记事本中打开该文件。如果这看起来没问题,则问题在于您的记录器或控制台未配置为使用
UTF-8
。否则,问题很可能出在您使用的 JSON API 上,它未配置为使用UTF-8
。编辑:如果问题实际上出在所使用的 JSON API 中,并且您不知道该选择哪个,那么我建议使用 Gson。它确实简化了将 Json 字符串转换为易于使用的 javabean 的过程。这是一个基本示例:
它很好地输出结果。希望这有帮助。
First try this:
Then open the file in notepad. If this looks fine, then the problem lies in your logger or console that it's not configured to use
UTF-8
. Else the problem most likely lies in the JSON API which you used that it's not configured to useUTF-8
.Edit: if the problem is actually in the JSON API used and you don't know which to choose, then I'd recommend to use Gson. It really eases converting a Json string to a easy-to-use javabean. Here's a basic example:
It outputs the results nicely. Hope this helps.
有一个库,它保留http响应的编码(捷克语表达式)与这样的 JSon 消息:
答案很棘手,有几点必须注意,主要是平台编码:
afaik影响打印到控制台,从输入流创建文件,甚至影响数据库客户端和服务器之间的通信尽管它们都设置为使用 utf-8 字符集进行编码 - 无论我是否显式创建 utf-8 字符串、inputstreamReader 或设置 UTF-8 的 JDBC 驱动程序,仍然在 Linux 系统上将 $LANG 属性设置为 xx_XX.UTF-8并将append =“vt.default_utf8 = 1”添加到LILO引导加载程序(在使用它的系统上),至少对于运行数据库和使用utf-8编码文件的java应用程序的系统必须完成。
即使我附加此 JVM 参数 -Dfile.encoding=UTF-8,如果没有平台编码,我也无法成功编码正确的流。正确设置 JDBC 连接器是必要的:“jdbc:mysql://localhost/DBname?useUnicode=true&characterEncoding=UTF8”,如果要将字符串持久保存到数据库,数据库应处于以下状态:
There is a library which retains the encoding of the http response (Czech expressions) with JSon message like this :
The answer is tricky and there are a few points one must pay attention to, mainly to platform encoding:
afaik affects printing out to console, creating files from an inputstream and even communication between DB client and server even though they are both set to use utf-8 charset for encoding - no matter whether I explicitly create utf-8 string, inputstreamReader or set JDBC driver for UTF-8, still setting up $LANG property to xx_XX.UTF-8 on linux systems and add append=" vt.default_utf8=1" to LILO boot loader (on systems that use it), must be done at least for systems running database and java apps working with utf-8 encoded files.
Even if I append this JVM parameter -Dfile.encoding=UTF-8, without the platform encoding I didn't succeed in properly encoded streams. Having JDBC connector set up properly is necessary : "jdbc:mysql://localhost/DBname?useUnicode=true&characterEncoding=UTF8", if you are going to persist the strings to a database, which should be in this state:
Google API 正确发送 UTF-8。我认为问题是您的默认编码无法输出阿拉伯语。检查您的
file.encoding
属性或获取这样的编码,如果默认编码是 ASCII 或 Latin-1,您将得到“?”。您需要将其更改为UTF-8。
The Google API correctly sends UTF-8. I think the problem is that your default encoding is not capable outputting Arabic. Check your
file.encoding
property or get encoding like this,If the default encoding is ASCII or Latin-1, you will get "?"s. You need to change it into UTF-8.