通过 Web 服务 API 进行 Web Scraper?
我将如何进行以下操作...
我想为我的应用程序构建一个 Web 服务,以从外部网站获取一段数据,这需要用户登录。该网站没有公共 API,这就是抓取工具的原因。
是否有一个库可以执行以下功能?或者我该怎么办?
- 自动填写表格,自动点击
- 自动提交按钮
- 检查用户登陆了哪个URL 打开,并将用户重定向到 URL
- 从标签中抓取数据。
编辑:我要求的是有一个网络服务、库等,以便更容易地执行屏幕抓取/自动化功能???
How would I go about doing the following...
I want to build a web service for my application to grab a piece of data from an external website, that requires the user to login. The website has no public API , hence the reason for the scraper.
Is there a library to perform the following functions? or what do I do?
- automate fill-in form, auto click
- Automate submit button
- check which URL the user has landed
on, and redirect user to URL - Grab data from label.
EDIT: what im asking for is there a web service, library etc to make it easier to perform screen scraping/automation functions???
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该查看表单的来源,并弄清楚数据是如何提交的,而不是填写表单并实际上单击按钮。在大多数情况下,您只需发送带有登录数据的发布请求即可。如果除了简单的发布请求之外还有一些特殊的东西,我会使用这个插件 找出正在执行哪些您看不到的请求。使用 C#,我会使用 HttpWebRequest 类,因为它会为您处理 cookie。
Instead of filling a form and virtually clicking buttons, you should look at the source of the form, and figure out how the data is being submitted. In most cases you can simply send a post request with the log in data. If there is something special besides a simple post request, I use this addon to figure out what requests are being done that you can't see. Using C#, I would use the HttpWebRequest class because it handles cookies for you.
如果网站不禁止机器人,您可以使用 YQL 来模拟您需要的一切。然而,这可能有点困难或不可能,因为您基本上必须在 JS 中实现纯文本浏览器。
If the website does not ban robots, you can use YQL to simulate everything you need. However, it can be a bit difficult or impossible as you basically have to implement a text-only browser within JS.