Python 自动网页下载,带有用户名、密码和 Cookie
我正在尝试用 Python 实现一个简单的程序,该程序读取 rom 网页并将其写入文件。大约有 2000 页消息已递增编号,但有些编号缺失。
该网站受用户名和密码保护,我使用的用户名和密码与通常手动访问该网站时使用的用户名和密码相同。我正在使用在官方 Python 网站中找到的一些带有 cookie 处理的代码示例,但是当我尝试使用这些代码时,我正在尝试复制回复
“您的浏览器不接受我们的 cookie。要查看此页面,请将您的浏览器首选项设置为接受 cookie。(代码 0)”
显然cookie有问题,也许我没有正确处理用户名和密码。关于以下代码有什么建议吗?
import urllib2
import cookielib
import string
import urllib
def cook():
url="http://www.URL.com/message/"
cj = cookielib.LWPCookieJar()
authinfo = urllib2.HTTPBasicAuthHandler()
realm = "http://www.URL.com"
username = "ID"
password = "PSWD"
host = "http://www.URL.com/message/"
authinfo.add_password(realm, host, username, password)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
urllib2.install_opener(opener)
# Create request object
txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
try:
req = urllib2.Request(url, None, txheaders)
cj.add_cookie_header(req)
f = urllib2.urlopen(req)
except IOError, e:
print "Failed to open", url
if hasattr(e, 'code'):
print "Error code:", e.code
else:
print f
cook
url="http://www.URL.com/message/"
urllib.urlretrieve(url + '1', 'filename')
I'm trying to implement in Python a simple program that reads rom web pages and writes them to files. There are about 2000 pages of messages incrementally numbered, but some numbers are missing.
The Web site is username and password protected, and I'm using the same username and password I normally use to access it manually. I'm using some code examples with cookie handling I found in the official Python web site, but when I try them the website I'm trying to copy replies
"Your browser is not accepting our cookies. To view this page, please set your browser preferences to accept cookies. (Code 0)"
Obviously there is a problem with cookies, and perhaps I'm not handling username and password correctly. Any suggestion regarding the following code?
import urllib2
import cookielib
import string
import urllib
def cook():
url="http://www.URL.com/message/"
cj = cookielib.LWPCookieJar()
authinfo = urllib2.HTTPBasicAuthHandler()
realm = "http://www.URL.com"
username = "ID"
password = "PSWD"
host = "http://www.URL.com/message/"
authinfo.add_password(realm, host, username, password)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
urllib2.install_opener(opener)
# Create request object
txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
try:
req = urllib2.Request(url, None, txheaders)
cj.add_cookie_header(req)
f = urllib2.urlopen(req)
except IOError, e:
print "Failed to open", url
if hasattr(e, 'code'):
print "Error code:", e.code
else:
print f
cook
url="http://www.URL.com/message/"
urllib.urlretrieve(url + '1', 'filename')
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论