如何使机械化不会因该页面上的表单而失败?
import mechanize
url = 'http://steamcommunity.com'
br=mechanize.Browser(factory=mechanize.RobustFactory())
br.open(url)
print br.request
print br.form
for each in br.forms():
print each
print
上面的代码结果是:
Traceback (most recent call last):
File "./mech_test.py", line 12, in <module>
for each in br.forms():
File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError
我的具体目标是使用登录表单,但我什至无法让 mechanize 识别出有任何表单。 即使使用我认为选择任何表单的最基本方法,br.select_form(nr=0)
,也会产生相同的回溯。 如果有区别的话,表单的 enctype 是 multipart/form-data 。
我想这一切都归结为一个由两部分组成的问题:我怎样才能让机械化处理这个页面,或者如果不可能,那么在维护 cookie 的同时还有什么其他方法?
编辑:如下所述,这会重定向到“https://steamcommunity.com”。
Mechanize 可以成功检索 HTML,如以下代码所示:
url = 'https://steamcommunity.com'
hh = mechanize.HTTPSHandler() # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()
print contents
import mechanize
url = 'http://steamcommunity.com'
br=mechanize.Browser(factory=mechanize.RobustFactory())
br.open(url)
print br.request
print br.form
for each in br.forms():
print each
print
The above code results in:
Traceback (most recent call last):
File "./mech_test.py", line 12, in <module>
for each in br.forms():
File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError
My specific goal is to use the login form, but I can't even get mechanize to recognize that there are any forms. Even using what I think is the most basic method of selecting any form, br.select_form(nr=0)
, results in the same traceback. The form's enctype is multipart/form-data if that makes a difference.
I guess that all boils down to a two part question: How can I get mechanize to work with this page, or if it's not possible, what's another way while maintaining cookies?
edit: As mentioned below, this redirects to 'https://steamcommunity.com'.
Mechanize can successfully retrieving the HTML as can be seen with the following code:
url = 'https://steamcommunity.com'
hh = mechanize.HTTPSHandler() # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()
print contents
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您是否提到该网站正在重定向到 https (ssl) 服务器?
好吧,尝试设置一个新的 HTTPS 处理程序,如下所示:
Did you mention that the website is redirecting to an https (ssl) server ?
Well, try to set a new HTTPS handler like this:
使用这个秘密,我相信这对你有用;)
Use this secret, i'm sure this is work for you ;)