使用Python/mechanize select_form()时出错?

发布于 2024-08-18 06:41:22 字数 1718 浏览 5 评论 0原文

我正在尝试从网站上删除一些数据。 我正在尝试编写的脚本应该获取页面的内容:

http://www.atpworldtour.com/Rankings/Singles.aspx

应该模拟用户通过附加排名和日期的每个选项并模拟单击“Go”,然后在获取数据后应该使用后退功能。

现在,我一直在尝试为附加地位选择此选项:

            <option value="101" >101-200</option>

这是我尝试执行此操作的(糟糕的)尝试:

from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import re
import urllib2



br = Browser();
br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
br.select_form(nr=0);
br["r"] = "101";

response = br.submit();

但是,它在应该选择第一个表单的 select_form(nr=0) 上失败了。

这是Python返回的日志:

>>> from mechanize import Browser
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> import urllib2
>>>
>>>
>>>
>>> br = Browser();
>>> br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
<response_seek_wrapper at 0x311bb48L whose wrapped object = <closeable_response
at 0x311be88L whose fp = <socket._fileobject object at 0x0000000002C94408>>>
>>> br.select_form(nr=0);
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 505, in select_
form
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 546, in __getattr__
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 559, in forms
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 228, in forms
mechanize._html.ParseError

我在mechanize主页中找不到所有功能的正确解释。任何人都可以向我指出使用表单和 Mechanize 的正确教程,或者帮助我解决这个特定问题吗?

安东尼

I am trying to scrap some data from a website.
The scripts I am trying to write, should get the content of the page:

http://www.atpworldtour.com/Rankings/Singles.aspx

Should simulate the user going trough every option for Additional Standings and the dates and simulate clicking on Go then after fetching the data should use the back function.

For now I have been trying to just select this option for Additional Standing:

            <option value="101" >101-200</option>

Here is my (poor) attempt to try to do this:

from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import re
import urllib2



br = Browser();
br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
br.select_form(nr=0);
br["r"] = "101";

response = br.submit();

However it just fails on the select_form(nr=0) which should be selecting the first form.

This is the log returned by Python:

>>> from mechanize import Browser
>>>
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> import urllib2
>>>
>>>
>>>
>>> br = Browser();
>>> br.open("http://www.atpworldtour.com/Rankings/Singles.aspx");
<response_seek_wrapper at 0x311bb48L whose wrapped object = <closeable_response
at 0x311be88L whose fp = <socket._fileobject object at 0x0000000002C94408>>>
>>> br.select_form(nr=0);
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 505, in select_
form
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 546, in __getattr__
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 559, in forms
  File "build\bdist.win-amd64\egg\mechanize\_html.py", line 228, in forms
mechanize._html.ParseError

I could not find a proper explanation of all the functions in the mechanize home page. Can anyone either point me to a proper tutorial for using forms and Mechanize or help me on this particular issue ?

Anthony

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

任谁 2024-08-25 06:41:22

我认为您正确使用了该库,但解析器似乎在处理该特定页面时遇到了问题。我在另一个页面上以相同的方式使用该库(“http://flashcarddb.com/login ") 并且它不会引发错误。

I think you are using the library correctly, but the parser seems to be having trouble with that particular page. I'm using the library in the same way on another page ("http://flashcarddb.com/login") and it does not raise an error.

握住我的手 2024-08-25 06:41:22

我刚刚遇到了同样的问题。我访问的页面通过了 W3C 验证,因此我认为这不是标记问题。然而,html tidy 抱怨该页面在 .一旦我解决了这个问题,机械化就开始工作了。

另外,我在邮件列表上看到了对此问题的回复。我只是想指出,将factory=mechanize.RobustFactory() 添加到mechanize.Browser() 并没有改变结果。

I just ran into this same problem. The page I was accessing passed W3C validation, so I didn't think it was a markup problem. However, html tidy complained that the page had a within a . Once I fixed that, mechanize started working.

Also, I saw a reply to this issue on a mailing list. I'd just like to note that adding factory=mechanize.RobustFactory() to mechanize.Browser() didn't change the result.

忆沫 2024-08-25 06:41:22

线索:定义更多关于你的 mechanize.Browser() 的信息

Clue : define MORE about your mechanize.Browser()

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文