如何使机械化不会因该页面上的表单而失败？

发布于 2024-07-21 06:02:54 字数 1324 浏览 4 评论 0原文

import mechanize

url = 'http://steamcommunity.com'

br=mechanize.Browser(factory=mechanize.RobustFactory())

br.open(url)
print br.request
print br.form
for each in br.forms():
    print each
    print

上面的代码结果是：

Traceback (most recent call last):
  File "./mech_test.py", line 12, in <module>
    for each in br.forms():
  File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError

我的具体目标是使用登录表单，但我什至无法让 mechanize 识别出有任何表单。即使使用我认为选择任何表单的最基本方法，br.select_form(nr=0)，也会产生相同的回溯。如果有区别的话，表单的 enctype 是 multipart/form-data 。

我想这一切都归结为一个由两部分组成的问题：我怎样才能让机械化处理这个页面，或者如果不可能，那么在维护 cookie 的同时还有什么其他方法？

编辑：如下所述，这会重定向到“https://steamcommunity.com”。

Mechanize 可以成功检索 HTML，如以下代码所示：

url = 'https://steamcommunity.com'

hh = mechanize.HTTPSHandler()  # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()

print contents

原文

import mechanize

url = 'http://steamcommunity.com'

br=mechanize.Browser(factory=mechanize.RobustFactory())

br.open(url)
print br.request
print br.form
for each in br.forms():
    print each
    print

The above code results in:

Traceback (most recent call last):
  File "./mech_test.py", line 12, in <module>
    for each in br.forms():
  File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError

My specific goal is to use the login form, but I can't even get mechanize to recognize that there are any forms. Even using what I think is the most basic method of selecting any form, br.select_form(nr=0), results in the same traceback. The form's enctype is multipart/form-data if that makes a difference.

I guess that all boils down to a two part question: How can I get mechanize to work with this page, or if it's not possible, what's another way while maintaining cookies?

edit: As mentioned below, this redirects to 'https://steamcommunity.com'.

Mechanize can successfully retrieving the HTML as can be seen with the following code:

url = 'https://steamcommunity.com'

hh = mechanize.HTTPSHandler()  # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()

print contents

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情话已封尘 2024-07-28 06:02:54

您是否提到该网站正在重定向到 https (ssl) 服务器？

好吧，尝试设置一个新的 HTTPS 处理程序，如下所示：

mechanize.HTTPSHandler()

Did you mention that the website is redirecting to an https (ssl) server ?

Well, try to set a new HTTPS handler like this:

mechanize.HTTPSHandler()

回复收藏 0 原文

遥远的绿洲 2024-07-28 06:02:54

使用这个秘密，我相信这对你有用;)

br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True))

Use this secret, i'm sure this is work for you ;)

br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True))

回复收藏 0 原文

~没有更多了~

关于作者

蹲墙角沉默

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

如何使机械化不会因该页面上的表单而失败？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

linfzu01

§对你不离不弃

可遇━不可求

枕梦

qq_3LFa8Q

JP

友情链接

如何使机械化不会因该页面上的表单而失败？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

linfzu01

§对你不离不弃

可遇━不可求

枕梦

qq_3LFa8Q

JP

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。