如何使用 htmlunit 获取“下一页”在谷歌上

发布于 2025-01-07 01:11:33 字数 1822 浏览 3 评论 0原文

我使用下面的代码来获取谷歌搜索结果的前两页 但我只能获取第一页(当搜索第2页时,它与第1页相同)

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;


/**
 * A simple Google search test using HtmlUnit.
 *
 * @author Rahul Poonekar
 * @since Apr 18, 2010
 */
public class Author_search {
    static final WebClient browser;

    static {
        browser = new WebClient();
        browser.setJavaScriptEnabled(false);
    }

    public static void main(String[] arguments) {
            searchTest();
    }

    private static void searchTest() {
        HtmlPage currentPage = null;

        try {
            currentPage = (HtmlPage) browser.getPage("http://www.google.com");
        } catch (Exception e) {
            System.out.println("Could not open browser window");
            e.printStackTrace();
        }
        System.out.println("Simulated browser opened.");

        try {
            ((HtmlTextInput) currentPage.getElementByName("q")).setValueAttribute("xxoo");
            currentPage = currentPage.getElementByName("btnG").click();
            System.out.println("contents: " + currentPage.asText());
            HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(), 'Next')]").get(0);
            currentPage = next.click();
            System.out.println("contents: " + currentPage.asText());
        } catch (Exception e) {
            System.out.println("Could not search");
            e.printStackTrace();
        }
    } 
}

有人能告诉我如何解决这个问题吗?

顺便问一下:

  1. 如何使用 htmlunit 更改 google 中的语言设置?任何 方便的方法?
  2. htmlunit 是否将 html 视为“firebug” firefox,或者只是像“文件->保存”中的文本一样对待它。在我的 意见,我相信它像探险家一样对待它,对吗?

I use the code below to fetch the first two pages of google search results
but i can only fetch the first page(when search page 2, it is the same with page 1)

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;


/**
 * A simple Google search test using HtmlUnit.
 *
 * @author Rahul Poonekar
 * @since Apr 18, 2010
 */
public class Author_search {
    static final WebClient browser;

    static {
        browser = new WebClient();
        browser.setJavaScriptEnabled(false);
    }

    public static void main(String[] arguments) {
            searchTest();
    }

    private static void searchTest() {
        HtmlPage currentPage = null;

        try {
            currentPage = (HtmlPage) browser.getPage("http://www.google.com");
        } catch (Exception e) {
            System.out.println("Could not open browser window");
            e.printStackTrace();
        }
        System.out.println("Simulated browser opened.");

        try {
            ((HtmlTextInput) currentPage.getElementByName("q")).setValueAttribute("xxoo");
            currentPage = currentPage.getElementByName("btnG").click();
            System.out.println("contents: " + currentPage.asText());
            HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(), 'Next')]").get(0);
            currentPage = next.click();
            System.out.println("contents: " + currentPage.asText());
        } catch (Exception e) {
            System.out.println("Could not search");
            e.printStackTrace();
        }
    } 
}

can anybody tell me how to fix this?

by the way:

  1. How to change the language settings in google using htmlunit? any
    convenient ways?
  2. Does htmlunit treat the html like "firebug" in
    firefox, or just treat it like the texts in "file->save".In my
    opinion, I believe it treat it like it was a explorer, am i right?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

べ繥欢鉨o。 2025-01-14 01:11:33

我替换:

HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(),'Next')]").get(0);
currentPage = next.click();

替换为:

HtmlAnchor nextAnchor =currentPage.getAnchorByText("Next");
currentPage = nextAnchor.click();

I replaced:

HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(),'Next')]").get(0);
currentPage = next.click();

with:

HtmlAnchor nextAnchor =currentPage.getAnchorByText("Next");
currentPage = nextAnchor.click();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文