Python 自动网页下载，带有用户名、密码和 Cookie

发布于 2024-10-06 23:48:48 字数 1327 浏览 4 评论 0原文

我正在尝试用 Python 实现一个简单的程序，该程序读取 rom 网页并将其写入文件。大约有 2000 页消息已递增编号，但有些编号缺失。

该网站受用户名和密码保护，我使用的用户名和密码与通常手动访问该网站时使用的用户名和密码相同。我正在使用在官方 Python 网站中找到的一些带有 cookie 处理的代码示例，但是当我尝试使用这些代码时，我正在尝试复制回复

“您的浏览器不接受我们的 cookie。要查看此页面，请将您的浏览器首选项设置为接受 cookie。（代码 0）”

显然cookie有问题，也许我没有正确处理用户名和密码。关于以下代码有什么建议吗？

import urllib2
import cookielib
import string
import urllib
def cook():
    url="http://www.URL.com/message/"
    cj = cookielib.LWPCookieJar()
    authinfo = urllib2.HTTPBasicAuthHandler()
    realm = "http://www.URL.com"
    username = "ID"
    password = "PSWD"
    host = "http://www.URL.com/message/"
    authinfo.add_password(realm, host, username, password)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
    urllib2.install_opener(opener)

    # Create request object
    txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
    try:
        req = urllib2.Request(url, None, txheaders)
        cj.add_cookie_header(req)
        f = urllib2.urlopen(req)

    except IOError, e:
        print "Failed to open", url
        if hasattr(e, 'code'):
            print "Error code:", e.code

    else:

        print f

cook
url="http://www.URL.com/message/"
urllib.urlretrieve(url + '1', 'filename')

原文

I'm trying to implement in Python a simple program that reads rom web pages and writes them to files. There are about 2000 pages of messages incrementally numbered, but some numbers are missing.

The Web site is username and password protected, and I'm using the same username and password I normally use to access it manually. I'm using some code examples with cookie handling I found in the official Python web site, but when I try them the website I'm trying to copy replies

"Your browser is not accepting our cookies. To view this page, please set your browser preferences to accept cookies. (Code 0)"

Obviously there is a problem with cookies, and perhaps I'm not handling username and password correctly. Any suggestion regarding the following code?

import urllib2
import cookielib
import string
import urllib
def cook():
    url="http://www.URL.com/message/"
    cj = cookielib.LWPCookieJar()
    authinfo = urllib2.HTTPBasicAuthHandler()
    realm = "http://www.URL.com"
    username = "ID"
    password = "PSWD"
    host = "http://www.URL.com/message/"
    authinfo.add_password(realm, host, username, password)
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
    urllib2.install_opener(opener)

    # Create request object
    txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
    try:
        req = urllib2.Request(url, None, txheaders)
        cj.add_cookie_header(req)
        f = urllib2.urlopen(req)

    except IOError, e:
        print "Failed to open", url
        if hasattr(e, 'code'):
            print "Error code:", e.code

    else:

        print f

cook
url="http://www.URL.com/message/"
urllib.urlretrieve(url + '1', 'filename')

分享到QQ

分享到微博