Jsoup 可以模拟按钮按下吗?

发布于 2024-12-05 14:29:01 字数 156 浏览 1 评论 0原文

您可以使用 Jsoup 向 Google 提交搜索,但不使用“Google 搜索”发送请求,而是使用“手气不错”吗?我想捕获将返回的网站的名称。

我看到很多提交表单的示例,但从未找到指定特定按钮来执行搜索或表单提交的方法。

如果 Jsoup 不起作用,那什么可以呢?

Can you use Jsoup to submit a search to Google, but instead of sending your request via "Google Search" use "I'm Feeling Lucky"? I would like to capture the name of the site that would be returned.

I see lots of examples of submitting forms, but never a way to specify a specific button to perform the search or form submission.

If Jsoup won't work, what would?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

聚集的泪 2024-12-12 14:29:01

根据 http://google.com 的 HTML 源代码,“我感觉很幸运”按钮的名称为 btnI

<input value="I'm Feeling Lucky" name="btnI" type="submit" onclick="..." />

因此,只需将 btnI 参数添加到查询字符串即可(该值并不重要):

http://www.google。 com/search?hl=en&btnI=1&q=your+search+term

所以,这个 Jsoup 应该这样做:

String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).get();
System.out.println(document.title());

然而,这给出了 403(禁止) 错误。

Exception in thread "main" java.io.IOException: 403 error loading URL http://www.google.com/search?hl=en&btnI=1&q=balusc
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:387)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
    at test.Test.main(Test.java:17)

也许 Google 正在嗅探用户代理并发现它是 Java。所以,我改变了它:

String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).userAgent("Mozilla").get();
System.out.println(document.title());

这产生了(如预期的那样):

BalusC 代码

403 表明谷歌不一定对这样的机器人感到满意。如果您经常这样做,您可能会被(暂时)封禁 IP。

According to the HTML source of http://google.com the "I am feeling lucky" button has a name of btnI:

<input value="I'm Feeling Lucky" name="btnI" type="submit" onclick="..." />

So, just adding the btnI parameter to the query string should do (the value doesn't matter):

http://www.google.com/search?hl=en&btnI=1&q=your+search+term

So, this Jsoup should do:

String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).get();
System.out.println(document.title());

However, this gave a 403 (Forbidden) error.

Exception in thread "main" java.io.IOException: 403 error loading URL http://www.google.com/search?hl=en&btnI=1&q=balusc
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:387)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
    at test.Test.main(Test.java:17)

Perhaps Google was sniffing the user agent and discovering it to be Java. So, I changed it:

String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).userAgent("Mozilla").get();
System.out.println(document.title());

This yields (as expected):

The BalusC Code

The 403 is however an indication that Google isn't necessarily happy with bots like that. You might get (temporarily) IP-banned when you do this too often.

玩心态 2024-12-12 14:29:01

我会尝试使用 HtmlUnit 来浏览网站,并尝试使用 JSOUP 来进行抓取

I'd try HtmlUnit for navigating trough a site, and JSOUP for scraping

抚你发端 2024-12-12 14:29:01

是的,如果您能够弄清楚 Google 搜索查询是如何进行的,就可以。但谷歌不允许这样做,即使你能成功。您应该使用他们的官方 API 进行自动搜索查询。

http://code.google.com/intl/en -US/apis/customsearch/v1/overview.html

Yes it can, if you are able to figure out how Google search queries are made. But this is not allowed by Google, even if you would success with that. You should use their official API to make automated search queries.

http://code.google.com/intl/en-US/apis/customsearch/v1/overview.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文