使用 mechanize 和 Python 2.6 下载 HTML 的编码问题

发布于 2024-09-25 02:16:12 字数 248 浏览 5 评论 0原文

browser = mechanize.Browser()
page = browser.open(url)
html = page.get_data()

print html

它显示了一些奇怪的字符。我认为它是 UTF-8 字符串，但 Python 不知道这一点并且无法正确显示它。

如何将此字符串转换为 unicode 字符串，例如

u = u'test'

原文

browser = mechanize.Browser()
page = browser.open(url)
html = page.get_data()

print html

It shows some strange characters. I suppose that it is UTF-8 string but Python doesn't know that and cannot show it properly.

How can I convert this string to unicode string like

u = u'test'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

司马昭之心 2024-10-02 02:16:12

它被压缩了

def ungzipResponse(r,b):
    headers = r.info()
    if headers['Content-Encoding']=='gzip':
        import gzip
        gz = gzip.GzipFile(fileobj=r, mode='rb')
        html = gz.read()
        gz.close()
        headers["Content-type"] = "text/html; charset=utf-8"
        r.set_data( html )
        b.set_response(r)

response = browser.open(url)
ungzipResponse(response, browser)
html = response.read()

It was gzipped

def ungzipResponse(r,b):
    headers = r.info()
    if headers['Content-Encoding']=='gzip':
        import gzip
        gz = gzip.GzipFile(fileobj=r, mode='rb')
        html = gz.read()
        gz.close()
        headers["Content-type"] = "text/html; charset=utf-8"
        r.set_data( html )
        b.set_response(r)

response = browser.open(url)
ungzipResponse(response, browser)
html = response.read()

回复收藏 0 原文

你是我的挚爱i 2024-10-02 02:16:12

u = html.decode('utf-8')

u = html.decode('utf-8')

回复收藏 0 原文

ゃ人海孤独症 2024-10-02 02:16:12

你需要定义编码
喜欢：

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-

机械化需要它。

欲了解更多信息，请查看此
http://www.python.org/dev/peps/pep-0263/

you need to define the encoding
like :

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-

mechanize need it .

for more information check this out
http://www.python.org/dev/peps/pep-0263/

回复收藏 0 原文

~没有更多了~

关于作者

寒江雪…

暂无简介

0 文章

0 评论

24 人气

关注发私信

玍銹的英雄夢

文章 0 评论 0

关注

我不会写诗

文章 0 评论 0

关注

十六岁半

文章 0 评论 0

关注

浸婚纱

文章 0 评论 0

关注

qq_kJ6XkX

文章 0 评论 0

关注

旧伤还要旧人安

文章 0 评论 0

友情链接

文江博客

使用 mechanize 和 Python 2.6 下载 HTML 的编码问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

使用 mechanize 和 Python 2.6 下载 HTML 的编码问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。