python中urllib2的解码问题

发布于 2024-10-02 17:26:51 字数 279 浏览 0 评论 0原文

我正在尝试使用 python 2.7 中的 urllib2 从网络获取页面。该页面恰好以 unicode(UTF-8) 编码并包含希腊字符。当我尝试使用下面的代码获取并打印它时，我得到的是乱码而不是希腊字符。

import urllib2
print urllib2.urlopen("http://www.pamestihima.gr").read()

Netbeans 6.9.1 和 Windows 7 CLI 中的结果相同。

我做错了什么，但是什么？

原文

I'm trying to use urllib2 in python 2.7 to fetch a page from the web. The page happens to be encoded in unicode(UTF-8) and have greek characters. When I try to fetch and print it with the code below, I get gibberish instead of the greek characters.

import urllib2
print urllib2.urlopen("http://www.pamestihima.gr").read()

The result is the same both in Netbeans 6.9.1 and in Windows 7 CLI.

I'm doing something wrong, but what?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

始于初秋 2024-10-09 17:26:51

Unicode不是UTF-8。 UTF-8 是一种字符串编码，如 ISO-8859-1、ASCII 等。
始终尽快解码您的数据，以将其转换为真正的 Unicode。 ('somestring in utf8'.decode('utf-8') == u'somestring in utf-8')，unicode 对象是 u'' ，而不是 < code>''
当您有数据离开应用时，请始终以正确的编码对其进行编码。对于 Web 内容，主要是 utf-8。对于控制台的东西，这就是你的控制台编码是什么。在 Windows 上，默认情况下不是UTF-8。