Python WWW 宏
我需要类似 iMacros for Python 的东西。如果有这样的事情那就太好了:
browse_to('www.google.com')
type_in_input('search', 'query')
click_button('search')
list = get_all('<p>')
你知道这样的事情吗?
提前致谢, 埃塔姆。
i need something like iMacros for Python. It would be great to have something like that:
browse_to('www.google.com')
type_in_input('search', 'query')
click_button('search')
list = get_all('<p>')
Do you know something like that?
Thanks in advance,
Etam.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
几乎直接实现了问题中的愿望 - twill。
(为了方便起见,twill 中包含了 pyparsing、mechanize 和 BeautifulSoup。)
Python API 示例:
Almost a direct fulfillment of the wishes in the question - twill.
(
pyparsing
,mechanize
, andBeautifulSoup
are included with twill for convenience.)A
Python API
example:使用机械化。除了在页面中执行 JavaScript 之外,它还不错。
Use mechanize. Other than executing JavaScript in a page, it's pretty good.
另一件需要考虑的事情是编写自己的脚本。一旦你掌握了它的窍门,它实际上并不太难,并且如果不调用六个巨大的库,它甚至可能会更快(但我不确定)。我使用名为“Charles”的网络调试器来浏览我想要抓取的网站。它记录所有传出/传入的 http 通信,我使用这些记录对查询字符串进行逆向工程。在 python 中操作它们可以实现相当快速、灵活的抓取。
Another thing to consider is writing your own script. It's actually not too tough once you get the hang of it, and without invoking a half dozen huge libraries it might even be faster (but I'm not sure). I use a web debugger called "Charles" to surf websites that I want to scrape. It logs all outgoing/incoming http communications, and I use the records to reverse engineer the query strings. Manipulating them in python makes for quite speedy, flexible scraping.