Python 2 与 Python 3 - urllib 格式

发布于 2024-09-07 23:43:24 字数 977 浏览 0 评论 0原文

我真的厌倦了试图弄清楚为什么这段代码在 Python 2 中工作而不是在 Python 3 中工作。我只是想抓取一页 json 然后解析它。这是 Python 2 中的代码:

import urllib, json
response = urllib.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content)

认为 Python 3 中的等效代码将是这样的:

import urllib.request, json
response = urllib.request.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content)

但它在我面前爆炸了,因为 read() 返回的数据是“字节”类型。但是,我一生都无法将其转换为 json 能够解析的内容。我从标题中知道 reddit 正在尝试将 utf-8 发送回给我,但我似乎无法将字节解码为 utf-8:

import urllib.request, json
response = urllib.request.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content.decode("utf8"))

我做错了什么?

编辑:问题是我无法使数据进入可用状态;即使 json 加载了数据,但部分数据无法显示,我希望能够将数据打印到屏幕上。

第二次编辑:问题似乎更多地与打印有关,而不是与解析有关。 Alex的答案提供了一种让脚本在Python 3中工作的方法,通过将IO设置为utf8。但仍然存在一个问题:为什么代码可以在 Python 2 中运行,但不能在 Python 3 中运行?

I'm getting really tired of trying to figure out why this code works in Python 2 and not in Python 3. I'm just trying to grab a page of json and then parse it. Here's the code in Python 2:

import urllib, json
response = urllib.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content)

I thought the equivalent code in Python 3 would be this:

import urllib.request, json
response = urllib.request.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content)

But it blows up in my face, because the data returned by read() is a "bytes" type. However, I cannot for the life of me get it to convert to something that json will be able to parse. I know from the headers that reddit is trying to send utf-8 back to me, but I can't seem to get the bytes to decode into utf-8:

import urllib.request, json
response = urllib.request.urlopen("http://reddit.com/.json")
content = response.read()
data = json.loads(content.decode("utf8"))

What am I doing wrong?

Edit: the problem is that I cannot get the data into a usable state; even though json loads the data, part of it is undisplayable, and I want to be able to print the data to the screen.

Second edit: The problem has more to do with print than parsing, it seems. Alex's answer provides a way for the script to work in Python 3, by setting the IO to utf8. But a question still remains: why is it that the code worked in Python 2, but not Python 3?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

迟月 2024-09-14 23:43:24

您发布的代码可能是由于错误的剪切和粘贴操作造成的,因为它在两个版本中都明显错误(f.read() 失败,因为没有定义 f barename) 。

在 Py3 中,ur = response.decode('utf8') 非常适合我,下面的 json.loads(ur) 也是如此。也许错误的复制和粘贴影响了您的 2 到 3 次转换尝试。

The code you post is presumably due to wrong cut-and-paste operations because it's clearly wrong in both versions (f.read() fails because there's no f barename defined).

In Py3, ur = response.decode('utf8') works perfectly well for me, as does the following json.loads(ur). Maybe the wrong copys-and-pastes affected your 2-to-3 conversion attempts.

花开雨落又逢春i 2024-09-14 23:43:24

根据你的 python 版本,你必须选择正确的库。

对于Python 3.5

import urllib.request
data = urllib.request.urlopen(url).read().decode('utf8')

对于Python 2.7

import urllib
url = serviceurl + urllib.urlencode({'sensor':'false', 'address': address})   
uh = urllib.urlopen(url)

Depends of your python version you have to choose the correct library.

for python 3.5

import urllib.request
data = urllib.request.urlopen(url).read().decode('utf8')

for python 2.7

import urllib
url = serviceurl + urllib.urlencode({'sensor':'false', 'address': address})   
uh = urllib.urlopen(url)
与风相奔跑 2024-09-14 23:43:24

请参阅另一个 Unicode 相关问题中的答案。

现在:Python 3 str(即 Python 2 unicode)类型是一个理想化的对象,从某种意义上说,它处理的是“字符”,而不是“字节”。这些字符,为了用于/来自磁盘/网络数据,需要通过“转换表”(又名编码又名代码页)编码为字节/解码为字节。由于操作系统的多样性,Python 历来避免猜测编码应该是什么;这些年来,这种情况一直在变化,但“面对歧义,拒绝猜测的诱惑”的原则仍然存在。适用。

值得庆幸的是,网络服务器使您的工作更加轻松。上面的响应应该为您提供所需的所有额外信息:

>>> response.headers['content-type']
'application/json; charset=UTF-8'

因此,每次向 Web 服务器发出请求时,请检查 Content-Type 标头中的字符集值,并将请求的数据解码为 Unicode ( Python 3:使用该字符集bytes.decode(charset)str)。

Please see that answer in another Unicode related question.

Now: the Python 3 str (which was the Python 2 unicode) type is an idealised object, in the sense that it deals with “characters”, not “bytes”. These characters, in order to be used for/from disk/network data, need to be encoded-into/decoded-from bytes by a “conversion table”, a.k.a encoding a.k.a codepage. Because of operating system variety, Python historically avoided to guess what that encoding should be; this has been changing over the years, but still the principle of “In the face of ambiguity, refuse the temptation to guess.” applies.

Thankfully, a web server makes your work easier. Your response above should give you all extra information needed:

>>> response.headers['content-type']
'application/json; charset=UTF-8'

So, every time you issue a request to a web server, check the Content-Type header for a charset value, and decode the request's data into Unicode (Python 3: bytes.decode(charset)str) by using that charset.

丢了幸福的猪 2024-09-14 23:43:24

这是一种跨两个版本兼容的方法 - 它的工作原理是首先将字节数据转换为字符串,然后加载该字符串。

import json
try:
    from urllib.request import Request, urlopen #python3+
except ImportError:
    from urllib2 import Request, urlopen        #python2

url = 'https://jsonfeed.org/feed.json'
request = Request(url)
response_json_string = urlopen(request).read().decode('utf8')
response_json_object = json.loads(response_json_string)

Here is an approach that is compatible across both versions - it works by first converting bytes data to string, and then loading the string.

import json
try:
    from urllib.request import Request, urlopen #python3+
except ImportError:
    from urllib2 import Request, urlopen        #python2

url = 'https://jsonfeed.org/feed.json'
request = Request(url)
response_json_string = urlopen(request).read().decode('utf8')
response_json_object = json.loads(response_json_string)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文