当前位置：文江博客话题详情

HtmlUnit 下载文件后无法检索页面

发布于 2024-12-14 09:50:30 字数 3293 浏览 2 评论 0原文

我在 Java 中的 HtmlUnit 中遇到了这个奇怪的问题。我正在使用它从网站下载一些数据，过程是这样的：

1 - 登录

2 - 对于每个元素（汽车）

----- 3 搜索汽车

----- 4 从 a 下载 zip 文件链接

代码：

创建网络客户端：

webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
webClient.setJavaScriptEnabled(true);
webClient.setThrowExceptionOnScriptError(false);
DefaultCredentialsProvider provider = new DefaultCredentialsProvider();
provider.addCredentials(USERNAME, PASSWORD);
webClient.setCredentialsProvider(provider);
webClient.setRefreshHandler(new ImmediateRefreshHandler());

  public void login() throws IOException
  {
    page = (HtmlPage) webClient.getPage(URL);
    HtmlForm form = page.getFormByName("formLogin");

    String user = USERNAME;
    String password = PASSWORD;

    // Enter login and password
    form.getInputByName("LoginSteps$UserName").setValueAttribute(user);
    form.getInputByName("LoginSteps$Password").setValueAttribute(password);

    // Click Login Button
    page = (HtmlPage) form.getInputByName("LoginSteps$LoginButton").click();

    webClient.waitForBackgroundJavaScript(3000);

    // Click on Campa area
    HtmlAnchor link = (HtmlAnchor) page.getElementById("ctl00_linkCampaNoiH");
    page = (HtmlPage) link.click();

    webClient.waitForBackgroundJavaScript(3000);
    System.out.println(page.asText());
  }

在网站中搜索汽车：

private void searchCar(String _regNumber) throws IOException
 {
// Open search window
page = page.getElementById("search_gridCampaNoi").click();

webClient.waitForBackgroundJavaScript(3000);

// Write plate number
HtmlInput element = (HtmlInput) page.getElementById("jqg1");
element.setValueAttribute(_regNumber);

webClient.waitForBackgroundJavaScript(3000);

// Click on search
HtmlAnchor anchor = (HtmlAnchor) page.getByXPath("//*[@id=\"fbox_gridCampaNoi_search\"]").get(0);
page = anchor.click();

webClient.waitForBackgroundJavaScript(3000);
System.out.println(page.asText());
}

下载pdf：

    try
    {
      InputStream is = _link.click().getWebResponse().getContentAsStream();
      File path = new File(new File(DOWNLOAD_PATH), _regNumber);
      if (!path.exists())
      {
        path.mkdir();
      }
      writeToFile(is, new File(path, _regNumber + "_pdfs.zip"));
    }
    catch (Exception e)
    {
      e.printStackTrace();
    }
  }

问题：

第一辆车工作正常，pdf已下载，但一旦我搜索对于一辆新车，当我到达这一行时：

page = page.getElementById("search_gridCampaNoi").click();

我得到这个异常：

Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.UnexpectedPage cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlPage

调试后，我意识到在我进行此调用时：

InputStream is = _link.click().getWebResponse().getContentAsStream();

page.getElementById("search_gridCampaNoi").click() 的返回类型从HtmlPage 到 WebResponse，因此我没有收到新页面，而是再次收到已下载的文件。

显示这种情况的调试器的几个屏幕截图：

第一次调用，返回类型 OK:

在此处输入图像描述

第二次调用，返回类型已更改，我不再收到 HtmlPage：

在此处输入图像描述

提前致谢！

原文

I'm having this weird problem with HtmlUnit in Java. I am using it to download some data from a website, the process is something like this:

1 - Login

2 - For each element (cars)

----- 3 Search for car

----- 4 Download zip file from a link

The code:

Creation of the webclient:

webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
webClient.setJavaScriptEnabled(true);
webClient.setThrowExceptionOnScriptError(false);
DefaultCredentialsProvider provider = new DefaultCredentialsProvider();
provider.addCredentials(USERNAME, PASSWORD);
webClient.setCredentialsProvider(provider);
webClient.setRefreshHandler(new ImmediateRefreshHandler());

  public void login() throws IOException
  {
    page = (HtmlPage) webClient.getPage(URL);
    HtmlForm form = page.getFormByName("formLogin");

    String user = USERNAME;
    String password = PASSWORD;

    // Enter login and password
    form.getInputByName("LoginSteps$UserName").setValueAttribute(user);
    form.getInputByName("LoginSteps$Password").setValueAttribute(password);

    // Click Login Button
    page = (HtmlPage) form.getInputByName("LoginSteps$LoginButton").click();

    webClient.waitForBackgroundJavaScript(3000);

    // Click on Campa area
    HtmlAnchor link = (HtmlAnchor) page.getElementById("ctl00_linkCampaNoiH");
    page = (HtmlPage) link.click();

    webClient.waitForBackgroundJavaScript(3000);
    System.out.println(page.asText());
  }

Search for car in website:

private void searchCar(String _regNumber) throws IOException
 {
// Open search window
page = page.getElementById("search_gridCampaNoi").click();

webClient.waitForBackgroundJavaScript(3000);

// Write plate number
HtmlInput element = (HtmlInput) page.getElementById("jqg1");
element.setValueAttribute(_regNumber);

webClient.waitForBackgroundJavaScript(3000);

// Click on search
HtmlAnchor anchor = (HtmlAnchor) page.getByXPath("//*[@id=\"fbox_gridCampaNoi_search\"]").get(0);
page = anchor.click();

webClient.waitForBackgroundJavaScript(3000);
System.out.println(page.asText());
}

Download pdf:

    try
    {
      InputStream is = _link.click().getWebResponse().getContentAsStream();
      File path = new File(new File(DOWNLOAD_PATH), _regNumber);
      if (!path.exists())
      {
        path.mkdir();
      }
      writeToFile(is, new File(path, _regNumber + "_pdfs.zip"));
    }
    catch (Exception e)
    {
      e.printStackTrace();
    }
  }

The problem:

The first car works okay, pdf is downloaded, but as soon as I search for a new car, when I get to this line:

page = page.getElementById("search_gridCampaNoi").click();

I get this exception:

Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.UnexpectedPage cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlPage

After debugging, I've realized that the moment I make this call:

InputStream is = _link.click().getWebResponse().getContentAsStream();

the return type of page.getElementById("search_gridCampaNoi").click() changes from HtmlPage to WebResponse, so instead of receiving a new page, I'm receiving again the file that I already downloaded.

A couple of screenshots of the debugger showing this situation:

First call, return type OK:

enter image description here

Second call, return type changed and I no longer receive a HtmlPage:

enter image description here

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

余生再见 2024-12-21 09:50:30

以防万一有人遇到同样的问题，我找到了一个解决方法。更改行：

InputStream is = _link.click().getWebResponse().getContentAsStream();

似乎

InputStream is = _link.openLinkInNewWindow().getWebResponse().getContentAsStream();

可以解决问题。我现在在进行多次迭代时遇到问题，有时有效，有时无效，但至少我现在有了一些东西。

Just in case someone encounters the same problem, I found a workaround.Changing the line:

InputStream is = _link.click().getWebResponse().getContentAsStream();

InputStream is = _link.openLinkInNewWindow().getWebResponse().getContentAsStream();

seems to do the trick. Im having problems now when doing several iterations, sometimes it works, sometimes it doesn't but at least I have something now.

回复收藏 0 原文

~没有更多了~