JSON字符串解码错误

发布于 2024-11-18 04:30:44 字数 1159 浏览 4 评论 0原文

我正在调用 URL：

http://code.google.com/feeds/issues/p/chromium/issues/full/291?alt=json

使用 urllib2 并使用 json 模块进行解码，

url = "http://code.google.com/feeds/issues/p/chromium/issues/full/291?alt=json"
request = urllib2.Request(query)
response = urllib2.urlopen(request)
issue_report = json.loads(response.read())

但遇到以下错误：

ValueError: Invalid control character at: line 1 column 1120 (char 1120)

我尝试检查标头，得到以下信息：

Content-Type: application/json; charset=UTF-8
Access-Control-Allow-Origin: *
Expires: Sun, 03 Jul 2011 17:38:38 GMT
Date: Sun, 03 Jul 2011 17:38:38 GMT
Cache-Control: private, max-age=0, must-revalidate, no-transform
Vary: Accept, X-GData-Authorization, GData-Version
GData-Version: 1.0
ETag: W/"CUEGQX47eCl7ImA9WxJaFEw."
Last-Modified: Tue, 04 Aug 2009 19:20:20 GMT
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Connection: close

我还尝试添加编码参数，如下所示：

issue_report = json.loads(response.read() , encoding = 'UTF-8')

我仍然遇到相同的错误。

原文

I am calling the URL :

http://code.google.com/feeds/issues/p/chromium/issues/full/291?alt=json

using urllib2 and decoding using the json module

url = "http://code.google.com/feeds/issues/p/chromium/issues/full/291?alt=json"
request = urllib2.Request(query)
response = urllib2.urlopen(request)
issue_report = json.loads(response.read())

I run into the following error :

ValueError: Invalid control character at: line 1 column 1120 (char 1120)

I tried checking the header and I got the following :

Content-Type: application/json; charset=UTF-8
Access-Control-Allow-Origin: *
Expires: Sun, 03 Jul 2011 17:38:38 GMT
Date: Sun, 03 Jul 2011 17:38:38 GMT
Cache-Control: private, max-age=0, must-revalidate, no-transform
Vary: Accept, X-GData-Authorization, GData-Version
GData-Version: 1.0
ETag: W/"CUEGQX47eCl7ImA9WxJaFEw."
Last-Modified: Tue, 04 Aug 2009 19:20:20 GMT
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Connection: close

I also tried adding an encoding parameter as follows :

issue_report = json.loads(response.read() , encoding = 'UTF-8')

I still run into the same error.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

歌枕肩 2024-11-25 04:30:44

此时，提要中包含来自 JPEG 的原始数据； JSON 格式错误，所以这不是你的错。向 Google 报告错误。

回复收藏 0 原文

三五鸿雁 2024-11-25 04:30:44

您可以考虑使用 lxml 代替，因为 JSON 格式错误。它的 XPath 支持使得使用 XML 变得非常简单：

import lxml.etree
url = 'http://code.google.com/feeds/issues/p/chromium/issues/full/291'
doc = lxml.etree.parse(url)
ns = {'issues': 'http://schemas.google.com/projecthosting/issues/2009'}
issues = doc.xpath('//issues:*', namespaces=ns)

相当容易操作元素，例如从标签中剥离名称空间、转换为字典：

>>> dict((x.tag[len(ns['issues'])+2:], x.text) for x in issues)
<<<    
{'closedDate': '2009-08-04T19:20:20.000Z',
 'id': '291',
 'label': 'Area-BrowserUI',
 'stars': '13',
 'state': 'closed',
 'status': 'Verified'}

You could consider using lxml instead, since the JSON is malformed. It's XPath support makes working with XML pretty straight-forward:

import lxml.etree
url = 'http://code.google.com/feeds/issues/p/chromium/issues/full/291'
doc = lxml.etree.parse(url)
ns = {'issues': 'http://schemas.google.com/projecthosting/issues/2009'}
issues = doc.xpath('//issues:*', namespaces=ns)

Fairly easy to manipulate elements, for instance to strip namespace from tags, convert to dict:

>>> dict((x.tag[len(ns['issues'])+2:], x.text) for x in issues)
<<<    
{'closedDate': '2009-08-04T19:20:20.000Z',
 'id': '291',
 'label': 'Area-BrowserUI',
 'stars': '13',
 'state': 'closed',
 'status': 'Verified'}

回复收藏 0 原文

~没有更多了~