Htmlunit getByXPath 不返回图像标签

发布于 2025-01-03 17:59:31 字数 854 浏览 5 评论 0原文

我正在尝试搜索特定页面上的所有图像标签。示例页面是 www.chapitre.com

我正在使用以下代码来搜索页面上的所有图像:

HtmlPage page = HTMLParser.parseHtml(webResponse, webClient.openWindow(null,"testwindow"));
List<?> imageList = page.getByXPath("//img");
ListIterator li = imageList.listIterator();

while (li.hasNext() ) {
    HtmlImage image = (HtmlImage)li.next();
    URL url = new URL(image.getSrcAttribute());

    //For now, only load 1X1 pixels
    if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
System.out.println("This is an image: " + url + " from page " + webRequest.getUrl() );
}

}

这不会返回页面中的所有图像标签。例如,具有属性“src=”http://ace-lb.advertising.com/site=703223/mnum=1516/bins=1/rich=0/logs=0/betr=A2099=[+ ]LP2" width="1" height="1"" 应该被捕获,但事实并非如此。我在这里做错了什么吗?

非常感谢任何帮助。

干杯!

I am trying to search all image tags on a specific page. An example page would be www.chapitre.com

I am using the following code to search for all images on the page:

HtmlPage page = HTMLParser.parseHtml(webResponse, webClient.openWindow(null,"testwindow"));
List<?> imageList = page.getByXPath("//img");
ListIterator li = imageList.listIterator();

while (li.hasNext() ) {
    HtmlImage image = (HtmlImage)li.next();
    URL url = new URL(image.getSrcAttribute());

    //For now, only load 1X1 pixels
    if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
System.out.println("This is an image: " + url + " from page " + webRequest.getUrl() );
}

}

This doesn't return me all the image tags in the page. For example, an image tag with attributes "src="http://ace-lb.advertising.com/site=703223/mnum=1516/bins=1/rich=0/logs=0/betr=A2099=[+]LP2" width="1" height="1"" should be captured, but its not. Am I doing something wrong here?

Any help is really appreciated.

Cheers!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我的影子我的梦 2025-01-10 17:59:31

那是因为

URL url = new URL(image.getSrcAttribute());

Is 抛出了一个异常:)

尝试这个代码:

public Main() throws Exception {
    WebClient webClient = new WebClient();
    webClient.setJavaScriptEnabled(false);
    HtmlPage page = webClient.getPage("http://www.chapitre.com");
    List<HtmlImage> imageList = (List<HtmlImage>) page.getByXPath("//img");
    for (HtmlImage image : imageList) {
        try {
            new URL(image.getSrcAttribute());
            if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
                System.out.println(image.getSrcAttribute());
            }
        } catch (Exception e) {
            System.out.println("You didn't see this comming :)");
        }
    }
}

您甚至可以通过 xpath 获取那些 1x1 像素图像。

希望这有帮助。

That's because

URL url = new URL(image.getSrcAttribute());

Is throwing you an exception :)

Try this code:

public Main() throws Exception {
    WebClient webClient = new WebClient();
    webClient.setJavaScriptEnabled(false);
    HtmlPage page = webClient.getPage("http://www.chapitre.com");
    List<HtmlImage> imageList = (List<HtmlImage>) page.getByXPath("//img");
    for (HtmlImage image : imageList) {
        try {
            new URL(image.getSrcAttribute());
            if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
                System.out.println(image.getSrcAttribute());
            }
        } catch (Exception e) {
            System.out.println("You didn't see this comming :)");
        }
    }
}

You can even get those 1x1 pixel images by xpath.

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文