使用 Python 3.x 基本获取 URL 的 HTML 正文

发布于 2024-11-02 08:34:47 字数 1753 浏览 2 评论 0原文

我是Python新手。我对 Python 2.x 中的旧 urllib 和 urllib2 与 Python 3 中的新 urllib 之间的差异有点困惑，除此之外，我不确定数据在发送到 urlopen 之前何时需要编码。

我一直在尝试使用 POST 获取 url 的 html 正文，以便我可以发送参数。该网页显示一个国家/地区在给定日期的特定小时内的阳光数据。我尝试过不进行编码/解码，打印输出是一串以 b 开头的字节。然后我尝试的代码是

import urllib.request, urllib.parse, urllib.error

def scrape(someurl):

    try:

        values = {'LANG': 'en',
                  'DATE' : '1303160400',
                  'CONT' : 'euro',
                  'LAND' : 'UK',
                  'KEY' : 'UK',
                  'SORT': '2',
                  'INT' : '06',
                  'TYPE' : 'sonnestd',
                  'ART' : 'karte',
                  'RUBRIK' : 'akt',
                  'R': '310',
                  'CEL': 'C'}

        data = urllib.parse.urlencode(values)
        data = data.encode("utf-8")
        response = urllib.request.urlopen(someurl, data)
        html = response.read().decode("utf-8")
        print(html)

    except urllib.error.HTTPError as e:
        print(e.code)
        print(e.read())

myscrape = scrape("http://www.weatheronline.co.uk/weather/maps/current")

错误是

Traceback (most recent call last):
  File "/Users/Me/Desktop/weather.py", line 57, in <module>
    myscrape = scrape("http://www.weatheronline.co.uk/weather/maps/current")
  File "/Users/Me/Desktop/weather.py", line 37, in scrape
    html = response.read().decode("utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 10: invalid start byte

没有编码/解码无论如何我都会得到一个可疑的短字节字符串，所以我想知道请求是否以其他方式失败

b'GIF89a\x01\x00\x01\x00\x80\x00\x00\x00\x00\x00\x00\x00\x00!\xf9\x04\x01\x00\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;'

原文

I'm a Python newbie. I have been a little confused by the differences between the old urllib and urllib2 in Python 2.x and the new urllib in Python 3, and among other things I'm not sure when data needs to be encoded before being sent to urlopen.

I have been trying to fetch the html body of a url, using a POST so that I can send parameters. The webpage displays sunshine data for a country over a particular hour of a given day. I have tried without encoding/decoding and the printout is a string of bytes with b at the beginning. The code I then tried was

import urllib.request, urllib.parse, urllib.error

def scrape(someurl):

    try:

        values = {'LANG': 'en',
                  'DATE' : '1303160400',
                  'CONT' : 'euro',
                  'LAND' : 'UK',
                  'KEY' : 'UK',
                  'SORT': '2',
                  'INT' : '06',
                  'TYPE' : 'sonnestd',
                  'ART' : 'karte',
                  'RUBRIK' : 'akt',
                  'R': '310',
                  'CEL': 'C'}

        data = urllib.parse.urlencode(values)
        data = data.encode("utf-8")
        response = urllib.request.urlopen(someurl, data)
        html = response.read().decode("utf-8")
        print(html)

    except urllib.error.HTTPError as e:
        print(e.code)
        print(e.read())

myscrape = scrape("http://www.weatheronline.co.uk/weather/maps/current")

The error is

Traceback (most recent call last):
  File "/Users/Me/Desktop/weather.py", line 57, in <module>
    myscrape = scrape("http://www.weatheronline.co.uk/weather/maps/current")
  File "/Users/Me/Desktop/weather.py", line 37, in scrape
    html = response.read().decode("utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 10: invalid start byte

Without encoding/decoding I get a suspiciously short string of bytes anyway, so I wonder whether the request is failing in some other way

b'GIF89a\x01\x00\x01\x00\x80\x00\x00\x00\x00\x00\x00\x00\x00!\xf9\x04\x01\x00\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清风疏影 2024-11-09 08:34:47

GIF89a 表示服务器正在向您发送图像。

另外，无论如何你都不应该使用 UTF-8 进行暴力解码；您应该查看响应标头以确定要使用哪种编码。

回复收藏 0 原文

~没有更多了~

关于作者

只涨不跌

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

使用 Python 3.x 基本获取 URL 的 HTML 正文

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

使用 Python 3.x 基本获取 URL 的 HTML 正文

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。