Jsoup 可以模拟按钮按下吗?
您可以使用 Jsoup 向 Google 提交搜索,但不使用“Google 搜索”发送请求,而是使用“手气不错”吗?我想捕获将返回的网站的名称。
我看到很多提交表单的示例,但从未找到指定特定按钮来执行搜索或表单提交的方法。
如果 Jsoup 不起作用,那什么可以呢?
Can you use Jsoup to submit a search to Google, but instead of sending your request via "Google Search" use "I'm Feeling Lucky"? I would like to capture the name of the site that would be returned.
I see lots of examples of submitting forms, but never a way to specify a specific button to perform the search or form submission.
If Jsoup won't work, what would?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据 http://google.com 的 HTML 源代码,“我感觉很幸运”按钮的名称为
btnI
:因此,只需将
btnI
参数添加到查询字符串即可(该值并不重要):所以,这个 Jsoup 应该这样做:
然而,这给出了 403(禁止) 错误。
也许 Google 正在嗅探用户代理并发现它是 Java。所以,我改变了它:
这产生了(如预期的那样):
403 表明谷歌不一定对这样的机器人感到满意。如果您经常这样做,您可能会被(暂时)封禁 IP。
According to the HTML source of http://google.com the "I am feeling lucky" button has a name of
btnI
:So, just adding the
btnI
parameter to the query string should do (the value doesn't matter):So, this Jsoup should do:
However, this gave a 403 (Forbidden) error.
Perhaps Google was sniffing the user agent and discovering it to be Java. So, I changed it:
This yields (as expected):
The 403 is however an indication that Google isn't necessarily happy with bots like that. You might get (temporarily) IP-banned when you do this too often.
我会尝试使用 HtmlUnit 来浏览网站,并尝试使用 JSOUP 来进行抓取
I'd try HtmlUnit for navigating trough a site, and JSOUP for scraping
是的,如果您能够弄清楚 Google 搜索查询是如何进行的,就可以。但谷歌不允许这样做,即使你能成功。您应该使用他们的官方 API 进行自动搜索查询。
http://code.google.com/intl/en -US/apis/customsearch/v1/overview.html
Yes it can, if you are able to figure out how Google search queries are made. But this is not allowed by Google, even if you would success with that. You should use their official API to make automated search queries.
http://code.google.com/intl/en-US/apis/customsearch/v1/overview.html