反序列化来自 Google 的 json 对象时出现编码错误

发布于 2024-10-06 07:19:43 字数 914 浏览 0 评论 0原文

作为练习，我构建了一个查询 Google Suggest JSON API 的小脚本。代码非常简单：

query = 'a'
url = "http://clients1.google.co.jp/complete/search?hl=ja&q=%s&json=t" %query
response = urllib.urlopen(url)
result = json.load(response)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 0: invalid start byte

如果我尝试 read() 响应对象，这就是我得到的：

'["a",["amazon","ana","au","apple","adobe","alc","\x83A\x83}\x83]\x83\x93","\x83A\x83\x81\x83u\x83\x8d","\x83A\x83X\x83N\x83\x8b","\x83A\x83\x8b\x83N"],["","","","","","","","","",""]]'

所以它表明当 python 尝试解码字符串时会引发错误。这只发生在 google.co.jp 和日语中。我在不同的国家/语言中尝试了相同的代码，但我没有遇到同样的问题：当我尝试反序列化对象时，一切正常。

我检查了响应标头，它们总是指定 utf-8 作为响应编码。
我使用在线解析器（http://json.parser.online.fr/）检查了 JSON 字符串，并再次检查了所有接缝 OK

有解决此问题的想法吗？是什么导致 JSON load() 函数阻塞？

提前致谢。

原文

As an exercise I built a little script that query Google Suggest JSON API. The code is quite simple:

query = 'a'
url = "http://clients1.google.co.jp/complete/search?hl=ja&q=%s&json=t" %query
response = urllib.urlopen(url)
result = json.load(response)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 0: invalid start byte

If I try to read() the response object, this is what I've got:

'["a",["amazon","ana","au","apple","adobe","alc","\x83A\x83}\x83]\x83\x93","\x83A\x83\x81\x83u\x83\x8d","\x83A\x83X\x83N\x83\x8b","\x83A\x83\x8b\x83N"],["","","","","","","","","",""]]'

So it seams that the error is raised when python try to decode the string. This only happens with google.co.jp and the Japanese language. I tried the same code with different contry/languages and I do not get the same issue: when I try to deserialize the object everything works OK.

I checked the response headers for and they always specify utf-8 as the response encoding.
I checked the JSON string with an online parser (http://json.parser.online.fr/) and again all seams OK

Any ideas to solve this problem? What make the JSON load() function choke?

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

硪扪都還晓 2024-10-13 07:19:43

响应标头 (print response.header) 包含以下信息：

Content-Type: text/javascript; charset=Shift_JIS

注意字符集。

如果您在 json.load 中指定此编码，它将起作用：

result = json.load(response, encoding='shift_jis')

The response header (print response.header) contains the following information:

Content-Type: text/javascript; charset=Shift_JIS

Note the charset.

If you specify this encoding in json.load it will work:

result = json.load(response, encoding='shift_jis')

回复收藏 0 原文

且行且努力 2024-10-13 07:19:43

无论规范如何规定，字符串“\x83A\x83}\x83]\x83\x93”都不是 UTF-8。

据猜测，它是 [ "cp932", "shift_jis", "shift_jis_2004", "shift_jisx0213" ] 之一；尝试解码其中之一。

回复收藏 0 原文

~没有更多了~

关于作者

迷鸟归林

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

反序列化来自 Google 的 json 对象时出现编码错误

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

反序列化来自 Google 的 json 对象时出现编码错误

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。