Python 3.[12]网址库
我正在编写一个小脚本,从网站上抓取一些文件。首先,我在网站内创建一个潜在网址列表。这在 Python 3.1 上工作得很好,但在 Python 3.2 上却不行。我想这是一个关于编码的问题,但我不确定如何以优雅的方式实现它。你能帮助我吗?
def get_urls(username, password, userid):
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
login_data = urllib.parse.urlencode({'login' : username, 'password' : password})
opener.open(BASE_URL+"/bg/login", login_data)
url = BASE_URL + "/bg/user/" + userid + "?finished=1"
resp = opener.open(url)
result = resp.read()
txt = result.decode("iso-8859-1")
liste = (re.findall("/bg/export/[\d]{4,8}",txt))
return liste
i am working on a little script grabbing some files from a website. First i create a list of potential urls within the website. This worked fine with Python 3.1 but not with Python 3.2. I guess it is a question on encoding but i am not sure how to realise it in an elegant way. Can you help me?
def get_urls(username, password, userid):
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
login_data = urllib.parse.urlencode({'login' : username, 'password' : password})
opener.open(BASE_URL+"/bg/login", login_data)
url = BASE_URL + "/bg/user/" + userid + "?finished=1"
resp = opener.open(url)
result = resp.read()
txt = result.decode("iso-8859-1")
liste = (re.findall("/bg/export/[\d]{4,8}",txt))
return liste
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题应该出在这里:
login_data = urllib.parse.urlencode({'登录': 用户名, '密码': 密码})
opener.open(BASE_URL+"/bg/login", login_data)
urllib.parse.urlencode
输出字符串不是可迭代的。The problem should be here:
login_data = urllib.parse.urlencode({'login' : username, 'password' : password})
opener.open(BASE_URL+"/bg/login", login_data)
urllib.parse.urlencode
outputs string not an iterable.