HTMLUnit 不会等待 Javascript

发布于 2024-10-30 03:54:53 字数 1162 浏览 1 评论 0原文

我有一个基于 GWT 的页面,我想使用 HtmlUnit 为其创建 HTML 快照。 该页面使用产品上的 Ajax/JavaScript 信息进行加载,因此大约 1 秒后会出现“正在加载...”消息,然后显示内容。

问题是 HtmlUnit 似乎没有捕获信息,我得到的只是“正在加载...”范围。

下面是一个使用 HtmlUnit 的实验代码,我尝试给它足够的时间来等待数据的加载,但它似乎没有改变任何东西,而且我仍然无法捕获 GWT javascript 加载的数据。

        WebClient webClient = new WebClient();
        webClient.setJavaScriptEnabled(true);
        webClient.setThrowExceptionOnScriptError(false);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController()); 

        WebRequest request = new WebRequest(new URL("<my_url>"));
        HtmlPage page = webClient.getPage(request);

        int i = webClient.waitForBackgroundJavaScript(1000);

        while (i > 0)
        {
            i = webClient.waitForBackgroundJavaScript(1000);

            if (i == 0)
            {
                break;
            }
            synchronized (page) 
            {
                System.out.println("wait");
                page.wait(500);
            }
        }

        webClient.getAjaxController().processSynchron(page, request, false);

        System.out.println(page.asXml());

有什么想法吗...?

I have a GWT based page that I would like to create an HTML snapshot for it using HtmlUnit.
The page loads using Ajax/JavaScript information on a product, so for about 1 second there is a Loading... message and then the content appears.

The problem is that HtmlUnit doesn't seem to capture the information and all I'm getting is the "Loading..." span.

Below is an experimental code with HtmlUnit where I try to give it enough time to wait for the loading of the data but it doesn't seem to change anything and I am still unable to capture the data loaded by the GWT javascript.

        WebClient webClient = new WebClient();
        webClient.setJavaScriptEnabled(true);
        webClient.setThrowExceptionOnScriptError(false);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController()); 

        WebRequest request = new WebRequest(new URL("<my_url>"));
        HtmlPage page = webClient.getPage(request);

        int i = webClient.waitForBackgroundJavaScript(1000);

        while (i > 0)
        {
            i = webClient.waitForBackgroundJavaScript(1000);

            if (i == 0)
            {
                break;
            }
            synchronized (page) 
            {
                System.out.println("wait");
                page.wait(500);
            }
        }

        webClient.getAjaxController().processSynchron(page, request, false);

        System.out.println(page.asXml());

Any ideas...?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

扭转时空 2024-11-06 03:54:53

感谢您的回复。
实际上我应该早点报告这一点,因为我自己找到了解决方案。
显然,当使用 FF 初始化 WebClient 时:

WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);

它似乎正在工作。
当使用默认构造函数初始化 WebClient 时,它默认使用 IE7,我猜 FF 对 Ajax 有更好的支持,是推荐使用的模拟器。

Thank you for responding.
I actually should have reported this sooner that I have found the solution myself.
Apparently when initialising WebClient with FF:

WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);

It seem to be working.
When initialising WebClient with the default constructor it uses IE7 by default and I guess FF has better support for Ajax and is the recommended emulator to use.

狼亦尘 2024-11-06 03:54:53

我相信默认情况下 NicelyResynchronizingAjaxController 只会通过跟踪其源自哪个线程来重新同步由用户操作引起的 AJAX 调用。也许 GWT 生成的 JavaScript 正在被 NicelyResynchronizingAjaxController 不想等待的其他线程调用。

尝试声明您自己的 AjaxController 来与所有内容同步,无论原始线程如何:

webClient.setAjaxController(new AjaxController(){
    @Override
    public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
    {
        return true;
    }
});

I believe by default NicelyResynchronizingAjaxController will only resynchronize AJAX calls that were caused by a user action, by tracking which thread it originated from. Perhaps the GWT generated JavaScript is being called by some other thread which NicelyResynchronizingAjaxController does not want to wait for.

Try declaring your own AjaxController to synchronize with everything regardless of originating thread:

webClient.setAjaxController(new AjaxController(){
    @Override
    public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
    {
        return true;
    }
});
赠佳期 2024-11-06 03:54:53

正如文档所述, waitForBackgroundJavaScript 是实验性的:

实验性 API:可能会在下一个版本中进行更改,并且可能尚未完美运行!

无论使用什么 BrowserVersion,下一种方法始终对我有用:

int tries = 5;  // Amount of tries to avoid infinite loop
while (tries > 0 && aCondition) {
    tries--;
    synchronized(page) {
        page.wait(2000);  // How often to check
    }
}

注意 aCondition 是您要检查的内容。例如:

page.getElementById("loading-text-element").asText().equals("Loading...")

As documentation states, waitForBackgroundJavaScript is experimental:

Experimental API: May be changed in next release and may not yet work perfectly!

The next approach has always worked for me, regardless of the BrowserVersion used:

int tries = 5;  // Amount of tries to avoid infinite loop
while (tries > 0 && aCondition) {
    tries--;
    synchronized(page) {
        page.wait(2000);  // How often to check
    }
}

Note aCondition is whatever you're checking for. EG:

page.getElementById("loading-text-element").asText().equals("Loading...")
離人涙 2024-11-06 03:54:53

到目前为止提供的解决方案都不适合我。我最终得到了 Dan Alvizu 的解决方案 + 我自己的技巧:

private WebClient webClient = new WebClient();

public void scrapPage() {
    makeWebClientWaitThroughJavaScriptLoadings();
    HtmlPage page = login();
    //do something that causes JavaScript loading
    waitOutLoading(page);
}

private void makeWebClientWaitThroughJavaScriptLoadings() {
    webClient.setAjaxController(new AjaxController(){
        @Override
        public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
        {
            return true;
        }
    });
}

private void waitOutLoading(HtmlPage page) {
    while(page.asText().contains("Please wait while loading!")){
        webClient.waitForBackgroundJavaScript(100);
    }
}

不用说,“加载时请稍候!”应替换为页面加载时显示的任何文本。如果没有文本,也许有一种方法可以检查某些 gif 是否存在(如果使用的话)。当然,如果您喜欢冒险,您可以简单地提供足够大的毫秒值。

None of the so far provided solutions worked for me. I ended up with Dan Alvizu's solution + my own hack:

private WebClient webClient = new WebClient();

public void scrapPage() {
    makeWebClientWaitThroughJavaScriptLoadings();
    HtmlPage page = login();
    //do something that causes JavaScript loading
    waitOutLoading(page);
}

private void makeWebClientWaitThroughJavaScriptLoadings() {
    webClient.setAjaxController(new AjaxController(){
        @Override
        public boolean processSynchron(HtmlPage page, WebRequest request, boolean async)
        {
            return true;
        }
    });
}

private void waitOutLoading(HtmlPage page) {
    while(page.asText().contains("Please wait while loading!")){
        webClient.waitForBackgroundJavaScript(100);
    }
}

Needless to say, "Please wait while loading!" should be replaced with whatever text is shown while your page is loading. If there is no text, maybe there is a way to check for existence of some gif (if that is used). Of course, you could simply provide a big enough milliseconds value if you're feeling adventurous.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文