如何使用HtmlUnit加载ajax?
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
public class YoutubeBot {
private static final String YOUTUBE = "http://www.youtube.com";
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient();
webClient.setThrowExceptionOnScriptError(false);
// This is equivalent to typing youtube.com to the adress bar of browser
HtmlPage currentPage = webClient.getPage("http://www.youtube.com/results?search_type=videos&search_query=official+music+video&search_sort=video_date_uploaded&suggested_categories=10%2C24&uni=3");
// Get form where submit button is located
HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");
// Get the input field.
HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
// Insert the search term.
searchInput.setText("java");
// Workaround: create a 'fake' button and add it to the form.
HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
submitButton.setAttribute("type", "submit");
searchForm.appendChild(submitButton);
//Workaround: use the reference to the button to submit the form.
HtmlPage newPage = submitButton.click();
//Find all links on page with given class
final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) currentPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']");
//Print all links to console
for (int i=0; i<listLinks.size(); i++)
System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href"));
}
}
这段代码有效,但我只想对 YouTube 剪辑进行排序,例如按上传日期。如何使用 HtmlUnit 做到这一点?我必须单击过滤器,这应该通过 ajax 请求加载内容,然后我应该单击“上传日期”链接。我只是不知道第一步,加载ajax内容。这可以用 HtmlUnit 实现吗?
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
public class YoutubeBot {
private static final String YOUTUBE = "http://www.youtube.com";
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient();
webClient.setThrowExceptionOnScriptError(false);
// This is equivalent to typing youtube.com to the adress bar of browser
HtmlPage currentPage = webClient.getPage("http://www.youtube.com/results?search_type=videos&search_query=official+music+video&search_sort=video_date_uploaded&suggested_categories=10%2C24&uni=3");
// Get form where submit button is located
HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");
// Get the input field.
HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
// Insert the search term.
searchInput.setText("java");
// Workaround: create a 'fake' button and add it to the form.
HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
submitButton.setAttribute("type", "submit");
searchForm.appendChild(submitButton);
//Workaround: use the reference to the button to submit the form.
HtmlPage newPage = submitButton.click();
//Find all links on page with given class
final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) currentPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']");
//Print all links to console
for (int i=0; i<listLinks.size(); i++)
System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href"));
}
}
This code is working but I just want to sort youtube clips for example by upload date. How to do this with HtmlUnit? I have to click on filter, this should load content by ajax request and then I should click on "Upload date" link. I just don't know this first step, to load ajax content. Is this possible with HtmlUnit?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这对我有用。设置此项
这将导致所有 ajax 调用同步。
这就是我设置 WebClient 对象的方式
This worked for me. Set this
This would cause all ajax calls to be synchronous.
This is how I setup my WebClient object
一种方法是:
search-lego-refinements
块。//ul/li/a
)。以下代码示例显示了如何完成此操作:
您也可以浏览正确的查询 URL (
http://www.youtube.com/results?search_type=videos&search_query=nyan+cat&search_sort=video_date_uploaded )。
但是,您必须对搜索参数进行编码(例如,用
+
替换空格)。Here's one way to do it:
search-lego-refinements
block by id.//ul/li/a
when you start from the previous id).The following code sample shows how it could be done:
You could just browse the correct query URL as well (
http://www.youtube.com/results?search_type=videos&search_query=nyan+cat&search_sort=video_date_uploaded
).But then you would have to encode your search parameter(s) (replace spaces with
+
for example).我之前曾出于类似目的使用过 HTMLUnit。
实际上,您可以在此处找到所需的所有信息。 HTMLUnit 默认启用 AJAX 支持,因此当您在代码中获取
newPage
对象时,您可以在页面上发出点击事件(查找特定元素并调用它的click()
功能)。最棘手的部分是 AJAX 是异步的,因此您必须在执行虚拟点击后调用wait()
或sleep()
,以便网站上的 Javascript 代码可以处理操作。这不是最好的方法,因为网络使用使sleep()
不可靠。您可能会发现,当您执行 AJAX 调用的事件时,页面上的某些内容会发生变化(例如,标题标题发生变化),因此您可以定期检查站点是否已发生此更改。 (我应该提到 HTMLUnit 内置了一个 事件重新同步器,但是我不能设法让它按照我的预期工作。)我使用 Firebug 或 Chrome 的开发人员工具栏来检查网站。您可以在 AJAX 调用之前和之后检查 DOM 树,这样您就知道如何引用页面上的特定控件(例如链接和下拉菜单)。然后我会使用 XPath 来获取特定元素,例如。你可以这样做(来自 HTML Unit 的示例):
YouTube 实际上不使用 AJAX 来处理其结果。当您单击结果页面上的排序下拉列表(这是一个经过修饰的
I've played with HTMLUnit earlier for similar purposes.
Actually you can find all information you need here. HTMLUnit has AJAX support enabled by default so when you get the
newPage
object in your code you can issue click events on the page (finding the specific element and call it'sclick()
function). The trickiest part is that AJAX is asynchronous so you have to callwait()
orsleep()
after performing virtual click so Javascript code on the site could process the actions. This is not the best approach since network usage makessleep()
unreliable. You may find some thing on the page which changes when you execute an event making AJAX calls (eg. a header title changes) so you can check regularly if this change has already happened to the site or not. (I should mention that there's an event resynchronizer built in to HTMLUnit, however i couldn't manage to make it work as i expected it to be.) I use Firebug or Chrome's developer toolbar for examining the site. You could check out the DOM tree before and after AJAX calls and this way you'll know how to reference specific controls (like links and dropdown menus) on the page.I would use XPath to get specific elements then, eg. you can do this (from HTML Unit's examples):
YouTube actually not uses AJAX for resorting it's result. When you click the sort dropdown on the result page (this is a decorated
<button>
) an absolute positioned<ul>
shows up (this emulates the drop-down part of the combo) which has<li>
elements for each menu item.<li>
elements contain a special<span>
element with ahref
attribute attached. When you click the<span>
element Javascript navigates the browser to thishref
value.For eg. in my case the sort by relevance
<span>
element looks like this:You can get the list of these spans relatively easily since the hosting
<ul>
is the only such child of<body>
. Although you have to click on the dropdown button first because it'll create the<ul>
element with all childs described above using Javascript. You can get the sort by button with this XPath:You can test your XPath queries eg. right in Chrome if you open the developer tools and the Javascript developer console from it's toolbar. Then you can test like this:
Hope this'll get you to the right direction.
http://htmlunit.sourceforge.net/faq.html#AJAXDoesNotWork
http://htmlunit.sourceforge.net/faq.html#AJAXDoesNotWork