如何使用HtmlUnit加载ajax?

发布于 2024-11-26 07:06:53 字数 2285 浏览 3 评论 0原文

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

public class YoutubeBot {
private static final String YOUTUBE = "http://www.youtube.com";

public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
    WebClient webClient = new WebClient();
    webClient.setThrowExceptionOnScriptError(false);

    // This is equivalent to typing youtube.com to the adress bar of browser
    HtmlPage currentPage = webClient.getPage("http://www.youtube.com/results?search_type=videos&search_query=official+music+video&search_sort=video_date_uploaded&suggested_categories=10%2C24&uni=3");

    // Get form where submit button is located
    HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");

    // Get the input field.
    HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
    // Insert the search term.
    searchInput.setText("java");

    // Workaround: create a 'fake' button and add it to the form.
    HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
    submitButton.setAttribute("type", "submit");
    searchForm.appendChild(submitButton);

    //Workaround: use the reference to the button to submit the form. 
    HtmlPage newPage = submitButton.click();

    //Find all links on page with given class
    final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) currentPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']");      

    //Print all links to console
    for (int i=0; i<listLinks.size(); i++)
        System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href"));

    }
}

这段代码有效,但我只想对 YouTube 剪辑进行排序,例如按上传日期。如何使用 HtmlUnit 做到这一点?我必须单击过滤器,这应该通过 ajax 请求加载内容,然后我应该单击“上传日期”链接。我只是不知道第一步,加载ajax内容。这可以用 HtmlUnit 实现吗?

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

public class YoutubeBot {
private static final String YOUTUBE = "http://www.youtube.com";

public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
    WebClient webClient = new WebClient();
    webClient.setThrowExceptionOnScriptError(false);

    // This is equivalent to typing youtube.com to the adress bar of browser
    HtmlPage currentPage = webClient.getPage("http://www.youtube.com/results?search_type=videos&search_query=official+music+video&search_sort=video_date_uploaded&suggested_categories=10%2C24&uni=3");

    // Get form where submit button is located
    HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");

    // Get the input field.
    HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
    // Insert the search term.
    searchInput.setText("java");

    // Workaround: create a 'fake' button and add it to the form.
    HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
    submitButton.setAttribute("type", "submit");
    searchForm.appendChild(submitButton);

    //Workaround: use the reference to the button to submit the form. 
    HtmlPage newPage = submitButton.click();

    //Find all links on page with given class
    final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) currentPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']");      

    //Print all links to console
    for (int i=0; i<listLinks.size(); i++)
        System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href"));

    }
}

This code is working but I just want to sort youtube clips for example by upload date. How to do this with HtmlUnit? I have to click on filter, this should load content by ajax request and then I should click on "Upload date" link. I just don't know this first step, to load ajax content. Is this possible with HtmlUnit?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

美胚控场 2024-12-03 07:06:53

这对我有用。设置此项

webClient.setAjaxController(new NicelyResynchronizingAjaxController());

这将导致所有 ajax 调用同步。

这就是我设置 WebClient 对象的方式

WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getCookieManager().setCookiesEnabled(true);

This worked for me. Set this

webClient.setAjaxController(new NicelyResynchronizingAjaxController());

This would cause all ajax calls to be synchronous.

This is how I setup my WebClient object

WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getCookieManager().setCookiesEnabled(true);
旧人哭 2024-12-03 07:06:53

一种方法是:

  1. 像在上一个问题。
  2. 按 ID 选择 search-lego-refinements 块。
  3. 使用 XPath 导航到 URL(当您从上一个 id 开始时,为 //ul/li/a)。
  4. 单击选定的链接。

以下代码示例显示了如何完成此操作:

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

public class YoutubeBot {
   private static final String YOUTUBE = "http://www.youtube.com";

   @SuppressWarnings("unchecked")
   public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
      WebClient webClient = new WebClient();
      webClient.setThrowExceptionOnScriptError(false);

      // This is equivalent to typing youtube.com to the adress bar of browser
      HtmlPage currentPage = webClient.getPage(YOUTUBE);

      // Get form where submit button is located
      HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");

      // Get the input field
      HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");

      // Insert the search term
      searchInput.setText("java");

      // Workaround: create a 'fake' button and add it to the form
      HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
      submitButton.setAttribute("type", "submit");
      searchForm.appendChild(submitButton);

      // Workaround: use the reference to the button to submit the form.
      currentPage = submitButton.click();

      // Get the div containing the filters
      HtmlElement filterDiv = currentPage.getElementById("search-lego-refinements");

      // Select the first link from the filter block (Upload date)
      HtmlAnchor sortByDateLink = ((List<HtmlAnchor>) filterDiv.getByXPath("//ul/li/a")).get(0);

      // Click the 'Upload date' link
      currentPage = sortByDateLink.click();

      System.out.println(currentPage.asText());
   }
}

您也可以浏览正确的查询 URL (http://www.youtube.com/results?search_type=videos&search_query=nyan+cat&search_sort=video_date_uploaded )。

但是,您必须对搜索参数进行编码(例如,用 + 替换空格)。

Here's one way to do it:

  1. Search the page as you did in your previous question.
  2. Select search-lego-refinements block by id.
  3. Use XPath to navigate to the URL (//ul/li/a when you start from the previous id).
  4. Click the selected link.

The following code sample shows how it could be done:

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

public class YoutubeBot {
   private static final String YOUTUBE = "http://www.youtube.com";

   @SuppressWarnings("unchecked")
   public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
      WebClient webClient = new WebClient();
      webClient.setThrowExceptionOnScriptError(false);

      // This is equivalent to typing youtube.com to the adress bar of browser
      HtmlPage currentPage = webClient.getPage(YOUTUBE);

      // Get form where submit button is located
      HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");

      // Get the input field
      HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");

      // Insert the search term
      searchInput.setText("java");

      // Workaround: create a 'fake' button and add it to the form
      HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
      submitButton.setAttribute("type", "submit");
      searchForm.appendChild(submitButton);

      // Workaround: use the reference to the button to submit the form.
      currentPage = submitButton.click();

      // Get the div containing the filters
      HtmlElement filterDiv = currentPage.getElementById("search-lego-refinements");

      // Select the first link from the filter block (Upload date)
      HtmlAnchor sortByDateLink = ((List<HtmlAnchor>) filterDiv.getByXPath("//ul/li/a")).get(0);

      // Click the 'Upload date' link
      currentPage = sortByDateLink.click();

      System.out.println(currentPage.asText());
   }
}

You could just browse the correct query URL as well (http://www.youtube.com/results?search_type=videos&search_query=nyan+cat&search_sort=video_date_uploaded).

But then you would have to encode your search parameter(s) (replace spaces with + for example).

居里长安 2024-12-03 07:06:53

我之前曾出于类似目的使用过 HTMLUnit。

实际上,您可以在此处找到所需的所有信息。 HTMLUnit 默认启用 AJAX 支持,因此当您在代码中获取 newPage 对象时,您可以在页面上发出点击事件(查找特定元素并调用它的 click()功能)。最棘手的部分是 AJAX 是异步的,因此您必须在执行虚拟点击后调用 wait()sleep(),以便网站上的 Javascript 代码可以处理操作。这不是最好的方法,因为网络使用使 sleep() 不可靠。您可能会发现,当您执行 AJAX 调用的事件时,页面上的某些内容会发生变化(例如,标题标题发生变化),因此您可以定期检查站点是否已发生此更改。 (我应该提到 HTMLUnit 内置了一个 事件重新同步器,但是我不能设法让它按照我的预期工作。)我使用 Firebug 或 Chrome 的开发人员工具栏来检查网站。您可以在 AJAX 调用之前和之后检查 DOM 树,这样您就知道如何引用页面上的特定控件(例如链接和下拉菜单)。

然后我会使用 XPath 来获取特定元素,例如。你可以这样做(来自 HTML Unit 的示例):

//get div which has a 'name' attribute of 'John'
final HtmlDivision div = (HtmlDivision) page.getByXPath("//div[@name='John']").get(0);

YouTube 实际上不使用 AJAX 来处理其结果。当您单击结果页面上的排序下拉列表(这是一个经过修饰的

I've played with HTMLUnit earlier for similar purposes.

Actually you can find all information you need here. HTMLUnit has AJAX support enabled by default so when you get the newPage object in your code you can issue click events on the page (finding the specific element and call it's click() function). The trickiest part is that AJAX is asynchronous so you have to call wait() or sleep() after performing virtual click so Javascript code on the site could process the actions. This is not the best approach since network usage makes sleep() unreliable. You may find some thing on the page which changes when you execute an event making AJAX calls (eg. a header title changes) so you can check regularly if this change has already happened to the site or not. (I should mention that there's an event resynchronizer built in to HTMLUnit, however i couldn't manage to make it work as i expected it to be.) I use Firebug or Chrome's developer toolbar for examining the site. You could check out the DOM tree before and after AJAX calls and this way you'll know how to reference specific controls (like links and dropdown menus) on the page.

I would use XPath to get specific elements then, eg. you can do this (from HTML Unit's examples):

//get div which has a 'name' attribute of 'John'
final HtmlDivision div = (HtmlDivision) page.getByXPath("//div[@name='John']").get(0);

YouTube actually not uses AJAX for resorting it's result. When you click the sort dropdown on the result page (this is a decorated <button>) an absolute positioned <ul> shows up (this emulates the drop-down part of the combo) which has <li> elements for each menu item. <li> elements contain a special <span> element with a href attribute attached. When you click the <span> element Javascript navigates the browser to this href value.

For eg. in my case the sort by relevance <span> element looks like this:

<span href="/results?search_type=videos&search_query=test&suggested_categories=2%2C24%2C10%2C1%2C28" class=" yt-uix-button-menu-item" onclick=";window.location.href=this.getAttribute('href');return false;">Relevancia</span>

You can get the list of these spans relatively easily since the hosting <ul> is the only such child of <body>. Although you have to click on the dropdown button first because it'll create the <ul> element with all childs described above using Javascript. You can get the sort by button with this XPath:

//div[@class='sort-by floatR']/button

You can test your XPath queries eg. right in Chrome if you open the developer tools and the Javascript developer console from it's toolbar. Then you can test like this:

>  $x("//div[@class='sort-by floatR']/button")

[
<button type=​"button" class=​" yt-uix-button yt-uix-button-text yt-uix-button-active" onclick=​";​return false;​" role=​"button" aria-pressed=​"true" aria-expanded=​"true" aria-haspopup=​"true" aria-activedescendant data-button-listener=​"26">​…​</button>​
]

Hope this'll get you to the right direction.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文