Python 自动网页下载,带有用户名、密码和 Cookie

发布于 2024-10-06 23:48:48 字数 1327 浏览 1 评论 0原文

我正在尝试用 Python 实现一个简单的程序,该程序读取 rom 网页并将其写入文件。大约有 2000 页消息已递增编号,但有些编号缺失。

该网站受用户名和密码保护,我使用的用户名和密码与通常手动访问该网站时使用的用户名和密码相同。我正在使用在官方 Python 网站中找到的一些带有 cookie 处理的代码示例,但是当我尝试使用这些代码时,我正在尝试复制回复

“您的浏览器不接受我们的 cookie。要查看此页面,请将您的浏览器首选项设置为接受 cookie。(代码 0)”

显然cookie有问题,也许我没有正确处理用户名和密码。关于以下代码有什么建议吗?

import urllib2
import cookielib
import string
import urllib
def cook():
    url="http://www.URL.com/message/"
    cj = cookielib.LWPCookieJar()
    authinfo = urllib2.HTTPBasicAuthHandler()
    realm = "http://www.URL.com"
    username = "ID"
    password = "PSWD"
    host = "http://www.URL.com/message/"
    authinfo.add_password(realm, host, username, password)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
    urllib2.install_opener(opener)

    # Create request object
    txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
    try:
        req = urllib2.Request(url, None, txheaders)
        cj.add_cookie_header(req)
        f = urllib2.urlopen(req)

    except IOError, e:
        print "Failed to open", url
        if hasattr(e, 'code'):
            print "Error code:", e.code

    else:

        print f

cook
url="http://www.URL.com/message/"
urllib.urlretrieve(url + '1', 'filename')

I'm trying to implement in Python a simple program that reads rom web pages and writes them to files. There are about 2000 pages of messages incrementally numbered, but some numbers are missing.

The Web site is username and password protected, and I'm using the same username and password I normally use to access it manually. I'm using some code examples with cookie handling I found in the official Python web site, but when I try them the website I'm trying to copy replies

"Your browser is not accepting our cookies. To view this page, please set your browser preferences to accept cookies. (Code 0)"

Obviously there is a problem with cookies, and perhaps I'm not handling username and password correctly. Any suggestion regarding the following code?

import urllib2
import cookielib
import string
import urllib
def cook():
    url="http://www.URL.com/message/"
    cj = cookielib.LWPCookieJar()
    authinfo = urllib2.HTTPBasicAuthHandler()
    realm = "http://www.URL.com"
    username = "ID"
    password = "PSWD"
    host = "http://www.URL.com/message/"
    authinfo.add_password(realm, host, username, password)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
    urllib2.install_opener(opener)

    # Create request object
    txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
    try:
        req = urllib2.Request(url, None, txheaders)
        cj.add_cookie_header(req)
        f = urllib2.urlopen(req)

    except IOError, e:
        print "Failed to open", url
        if hasattr(e, 'code'):
            print "Error code:", e.code

    else:

        print f

cook
url="http://www.URL.com/message/"
urllib.urlretrieve(url + '1', 'filename')

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文